StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
19126831
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-10-01T22:09:13.093
FavoriteCount
0
LastActivityDate
2013-10-03T01:59:11.813
LastEditDate
2013-10-03T01:59:11.813
LastEditorUserId
13005
OwnerUserId
13005
ParentId
19126228
PostTypeId
2
Score
5
ViewCount
0
LastEditorDisplayName
text
Body
With some variation according to which level of cache you're talking about, cache works as follows: <ul> <li>if the data is already in cache then it is fast to access</li> <li>if the data is not in cache then you incur a cost, but an entire cache line (or page, if we're talking RAM vs swap file rather than cache vs RAM) is brought into cache, so access close to the missed address will not miss.</li> <li>if you're lucky then the memory subsystem will detect sequential access and pre-fetch data that it thinks you're about to need.</li> </ul> So naively the questions to ask are: <ol> <li>how many cache misses occur? -- B wins, because in A you fetch some unused data per record, whereas in B you fetch none other than a small rounding error at the end of the iteration. So in order to visit all of the necessary data, B fetches fewer cache lines, assuming a significant number of records. If the number of records is insignificant, then cache performance may have little or nothing to do with the performance of your code, because a program that uses a small enough amount of data will find that it's all in cache all the time.</li> <li>is the access sequential? -- yes in both cases, although this might be harder to detect in case B because there are two interleaved sequences rather than just one.</li> </ol> So, I would sort of expect B to be faster for this code. However: <ul> <li>if this is the only access to the data, then you could speed up A by removing most of the data members from the <code>struct</code>. So do that. Presumably in fact it is not the only access to the data in your program, and the other accesses might affect performance in two ways: the time they actually take, and whether they populate the cache with the data you need.</li> <li>what I expect and what actually happens are frequently different things, and there is little point relying on speculation if you have any ability to test it. In the best case, the sequential access means that there are no cache misses in either code. Testing performance requires no special tool (although they can make it easier), just a clock with a second hand. At a pinch, fashion a pendulum from your phone charger.</li> <li>there are some complications I have ignored. Depending on hardware, if you're unlucky with B then at the lowest cache level you could find that the accesses to one vector are evicting the accesses to the other vector, because the corresponding memory just happens to use the same location in cache. This would cause two cache misses per record. This will only happen on what's called "direct-mapped cache". "Two-way cache" or better would save the day, by allowing chunks of both vectors to co-exist even if their first preference location in cache is the same. I don't think that PC hardware generally uses direct-mapped cache, but I don't know for sure and I don't know much about GPUs.</li> </ul>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POWhich is most cache friendly?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USSteve Jessop
UserOwnerUserId
1. USSteve Jessop
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POWhich is most cache friendly?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COModern Intel CPUs (Sandy/Ivy Bridge) have 8-way L1 and L2 cache and 12-way L3. Not sure about AMD. I'm also pretty sure most ARM processors with more than 4k L1 cache are 4-way.
 singulars
 PostPostId
 PO
 UserUserId
 USMart

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.