StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
20049254
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2013-11-18T13:40:34.410
FavoriteCount
0
LastActivityDate
2013-11-18T18:08:10.753
LastEditDate
2013-11-18T18:08:10.753
LastEditorUserId
383045
OwnerUserId
383045
ParentId
20048531
PostTypeId
2
Score
5
ViewCount
0
LastEditorDisplayName
text
Body
Without knowing anything else about your problem or your current implementation, one (somewhat) easy way to improve performance (to some extent) is to manually prefetch the values that your "sum" function is going to operate on. Ignoring architectural and compiler nuances for now, manual prefetching could look like this: <pre><code>SmallStruct values [value_count] = {/*whatever*/}; int indices [index_count] = {/*whatever*/}; ... SmallStruct v = values[indices[0]]; for (int i = 1; i < index_count; ++i) { SmallStruct v_next = values[indices[i]]; DoSomethingWith (v); // Note the *v* v = v_next; // You don't want to copy, but this is the simplest form } DoSomethingWith (v); // Do the final item </code></pre> The above is the simplest possible form of prefetching. You can unroll the loop a little bit to avoid the copying mentioned above, and also you probably want to do more than a single prefetch. This optimization works because most (all?) modern architectures can have more than one memory request in flight, which means that those requests are overlapped and the average waiting time for those (presumably uncached) requests is divided by their concurrency (which is a good thing!) So, it wouldn't matter how many unused cache lines you have; the important factor is the number of concurrent memory reads the memory system can sustain at any given time. A Note on the Effect of Cache Lines The above (admittedly simplistic) code ignores two very important facts: that the entire <code>SmallStruct</code> cannot be read in one memory access (from CPU's point of view) which is a bad thing, and that memory is always read in units of cache lines (64 or 128 bytes, these days) anyways, which is very good! So, instead of trying to read the entire <code>values[indices[i]]</code> into <code>v_next</code>, we can just read one single byte, and assuming the <code>values</code> array is properly aligned, a significant amount of memory (one full cache line) will be loaded and at hand for eventual processing. Two important points: <ol> <li>If your <code>SmallStruct</code> is not in fact small and won't fit entirely in a cache line, you must rearrange its members to make sure that the parts of it that is required in <code>DoSomethingWith()</code> are contiguous and packed and fit in one cache line. If they still don't fit, you should consider separating your algorithm into two or more passes, each operating on data that fits in one cache line.</li> <li>If you just read one byte (or one word, or whatever) from the next value you'll be accessing, make sure that the compiler doesn't optimize that read away!</li> </ol> Alternate Implementations The second point above can be expressed in code, like this: <pre><code>touch (&values[indices[0]]); for (int i = 0; i < index_count; ++i) { if (i + 1 < index_count) touch (&values[indices[i + 1]]); DoSomethingWith (values[indices[i]]); } </code></pre> The <code>touch()</code> function is semantically like this (although the implementation would probably be more involved.) <pre><code>void touch (void * p) { char c = *(char *)p; } </code></pre> To prefetch more than one value, you'd do something like this: (Update: I changed my code to (I believe) a better implementation.) <pre><code>const int PrefetchCount = 3; // Get the ball rolling... for (int j = 0; j < PrefetchCount; ++j) touch (&values[indices[j]]); for (int i = 0; i < index_count; ++i) { if (i + PrefetchCount < index_count) touch (&values[indices[i + PrefetchCount]]); DoSomethingWith (values[indices[i]]); } </code></pre> Again, note that all the implementations above are very simple and simplistic. Also, if you prefetch too much, you can blow your L1 cache and your performance with it. Doing the Actual Prefetch The x86-64 CPU has an instruction that you use to ask the CPU to prefetch a cache-line-worth of memory data into its caches. Actually, using this instruction you give a hint to the CPU that that particular memory location is going to be used by your application and the CPU will try to bring it into cache. If you do this soon enough, the data will be ready by the time you need it and your computations won't stall. The instruction is <code>PREFETCH*</code> and you can use compiler-specific intrinsics instead of resorting to assembly. These intrinsics are called <code>_mm_prefetch</code> for Microsoft and Intel C++ compilers, and <code>__builtin_prefetch</code> on GCC. (If you ended up using this, just remember that you want the lowest level of prefetching, i.e. <code>T0</code>.) Note that these go into the implementation of the <code>touch</code> function I used above. I know of no library that does this in a reusable way. Also, I have no familiarity with C# libraries to know whether these are available there or not.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POFastest way fetch an array of memory values
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USyzt
UserOwnerUserId
1. USyzt
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POFastest way fetch an array of memory values
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.