StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
6811063
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2011-07-25T01:41:59.283
FavoriteCount
0
LastActivityDate
2011-07-25T02:04:36.100
LastEditDate
2017-05-23T11:52:50.030
LastEditorUserId
-1
OwnerUserId
53543
ParentId
6804568
PostTypeId
2
Score
8
ViewCount
0
LastEditorDisplayName
text
Body
I'm going to answer the question I think you really meant to ask, which is "should <code>push_back()</code> be avoided in the inner loops of heavy algorithms?" rather than what others seem to have read into your post, which is "does it matter if I call push_back before doing an unrelated sort on a large vector?" Also, I'm going to answer from my experience rather than spend time chasing down citations and peer-reviewed articles. Your example is basically doing two things that add up to the total CPU cost: it's reading and operating on elements in the input vector, and then it has to insert the elements into the output vector. You're concerned about the cost of inserting elements because: <ol> <li>push_back() is constant time (instantaneous, really) when a vector has enough space pre-reserved for an additional element, but slow when you've run out of reserved space. </li> <li>Allocating memory is costly (<a href="https://stackoverflow.com/questions/470683/memory-allocation-deallocation-bottleneck"><code>malloc()</code> is just slow</a>, even when pedants pretend that <code>new</code> is something different)</li> <li>Copying a vector's data from one region to another after reallocation <a href="https://stackoverflow.com/questions/4008128/one-large-malloc-versus-multiple-smaller-reallocs">is also slow</a>: when push_back() finds it hasn't got enough space, <a href="http://www.tantalon.com/pete/files/gdc04_common_cpp_mistakes_in_games.ppt" rel="nofollow noreferrer">it has to go and allocate a bigger vector, then copy all the elements</a>. (In theory, for vectors that are many OS pages in size, a magic implementation of the STL could use the VMM to move them around in the virtual address space without copying — in practice <a href="http://www.gamedev.net/topic/505481-do-stl-containers-always-move-memory/" rel="nofollow noreferrer">I've never seen one that could</a>.)</li> <li>Over-allocating the output vectors causes problems: it causes fragmentation, making future allocations slower; it burns data cache, making everything slower; if persistent, it ties up scarce free memory, leading to disk paging on a PC and a crash on embedded platforms.</li> <li>Under-allocating the output vectors causes problems because reallocating a vector is an O(n) operation, so reallocating it m times is O(m×n). If the STL's default allocator uses exponential reallocation (making the vector's reserve twice its previous size every time you realloc), that makes your linear algorithm O(n + n log m).</li> </ol> Your instinct, therefore, is correct: always pre-reserve space for your vectors where possible, not because push_back is slow, but because it can trigger a reallocation that is slow. Also, if you look at the implementation of <code>shrink_to_fit</code>, you'll see it also does a copy reallocation, temporarily doubling your memory cost and causing further fragmentation. Your problem here is that you don't always know exactly how much space you'll need for the output vectors; the usual response is to use a heuristic and maybe a custom allocator. Reserve n/2+k of the input size for each of your output vectors by default, where k is some safety margin. That way you'll usually have enough space for the output, so long as your input is reasonably balanced, and push_back can reallocate in the rare cases where it's not. If you find that push_back's exponential behavior is wasting too much memory ( causing you to reserve 2n elements when really you just needed n+2 ), you can give it a custom allocator that expands the vector size in smaller, linear chunks — but of course that will be much slower in cases where the vectors are really unbalanced and you end up doing lots of resizes. There's no way to always reserve the exact right amount of space without walking the input elements in advance; but if you know what the balance usually looks like, you can use a heuristic to make a good guess at it for a statistical performance gain over many iterations.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POBeing Smart About Vector Memory Allocation
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USCrashworks
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POBeing Smart About Vector Memory Allocation
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId
1. CO+1 for the shrink_to_fit-does-a-copy. Just looked at the Microsoft implementation and, indeed...
 singulars
 PostPostId
 PO
 UserUserId
 USGabriel

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.