Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy is calling vector.reserve(required + 1) faster than vector.reserve(required)?
    primarykey
    data
    text
    <p>I am doing some tests measuring the performance of standard containers under various conditions, and I came across something odd. When I am inserting many items into the middle of a <code>std::vector</code>, if I first call reserve with the exact number of elements that I will be adding, I see essentially no performance difference in most circumstances compared with not calling reserve, which is surprising. More surprising, however, is that if I call reserve with the exact number of elements I need <code>+ 1</code>, then I get a significant performance improvement. This is a sample table of results that I just got (all time are in seconds):</p> <pre><code>+---------------+--------+-------------------+-----------------------+ | # of elements | vector | vector (reserved) | vector (reserved + 1) | +---------------+--------+-------------------+-----------------------+ | 10000 | 0.04 | 0.04 | 0.03 | | 20000 | 0.14 | 0.14 | 0.11 | | 30000 | 0.32 | 0.32 | 0.25 | | 40000 | 0.55 | 0.55 | 0.44 | | 50000 | 0.87 | 0.85 | 0.66 | | 60000 | 1.24 | 1.24 | 0.96 | | 70000 | 1.69 | 1.68 | 1.31 | | 80000 | 2.17 | 2.21 | 1.71 | | 90000 | 2.78 | 2.75 | 2.16 | | 100000 | 3.43 | 3.44 | 2.68 | | 110000 | 4.13 | 4.15 | 3.23 | | 120000 | 4.88 | 4.89 | 3.86 | | 130000 | 5.79 | 5.8 | 4.51 | | 140000 | 6.71 | 6.71 | 5.24 | | 150000 | 7.7 | 7.7 | 6.02 | | 160000 | 8.76 | 8.67 | 6.86 | | 170000 | 9.9 | 9.91 | 7.74 | | 180000 | 11.07 | 10.98 | 8.64 | | 190000 | 12.34 | 12.35 | 9.64 | | 200000 | 13.64 | 13.56 | 10.72 | | 210000 | 15.1 | 15.04 | 11.67 | | 220000 | 16.59 | 16.41 | 12.89 | | 230000 | 18.05 | 18.06 | 14.13 | | 240000 | 19.64 | 19.74 | 15.36 | | 250000 | 21.34 | 21.17 | 16.66 | | 260000 | 23.08 | 23.06 | 18.02 | | 270000 | 24.87 | 24.89 | 19.42 | | 280000 | 26.5 | 26.58 | 20.9 | | 290000 | 28.51 | 28.69 | 22.4 | | 300000 | 30.69 | 30.74 | 23.97 | | 310000 | 32.73 | 32.81 | 25.57 | | 320000 | 34.63 | 34.99 | 27.28 | | 330000 | 37.12 | 37.17 | 28.99 | | 340000 | 39.36 | 39.43 | 30.83 | | 350000 | 41.7 | 41.48 | 32.45 | | 360000 | 44.11 | 44.22 | 34.55 | | 370000 | 46.62 | 46.71 | 36.22 | | 380000 | 49.09 | 48.91 | 38.46 | | 390000 | 51.71 | 51.98 | 40.22 | | 400000 | 54.45 | 54.56 | 43.03 | | 410000 | 57.23 | 57.29 | 44.84 | | 420000 | 60 | 59.73 | 46.67 | | 430000 | 62.9 | 63.03 | 49.3 | +---------------+--------+-------------------+-----------------------+ </code></pre> <p>I checked the implementation, and it doesn't appear to have an off-by-one error. I then further tested by printing the size and the capacity immediately after calling reserve, and then I printed them again after filling up the vector, and everything looks good.</p> <pre><code>before: size: 0 capacity: 10000 after: size: 10000 capacity: 10000 before: size: 0 capacity: 20000 after: size: 20000 capacity: 20000 ... </code></pre> <p>Compiler is gcc 4.7.2 on Fedora Linux x86_64. Compiler options are <code>-std=c++11 -Ofast -march=native -funsafe-loop-optimizations -flto=4 - fwhole-program</code></p> <p>The code is below.</p> <pre><code>#include &lt;algorithm&gt; #include &lt;array&gt; #include &lt;cstdint&gt; #include &lt;vector&gt; #include &lt;random&gt; #include &lt;string&gt; #include &lt;iostream&gt; #include &lt;fstream&gt; #include &lt;boost/timer.hpp&gt; namespace { constexpr size_t array_size = 1; unsigned number() { static std::random_device rd; static std::mt19937 random_engine(rd()); static std::uniform_int_distribution&lt;uint32_t&gt; distribution(0, std::numeric_limits&lt;uint32_t&gt;::max()); return distribution(random_engine); } class Class { public: Class() { x[0] = number(); } std::string to_string() const { return std::to_string(x[0]); } inline friend bool operator&lt;=(Class const &amp; lhs, Class const &amp; rhs) { return lhs.x[0] &lt;= rhs.x[0]; } private: std::array&lt;uint32_t, array_size&gt; x; }; template&lt;typename Container&gt; void add(Container &amp; container, Class const &amp; value) { auto const it = std::find_if(std::begin(container), std::end(container), [&amp;](Class const &amp; c) { return value &lt;= c; }); container.emplace(it, value); } // Do something with the result template&lt;typename Container&gt; void insert_to_file(Container const &amp; container) { std::fstream file("file.txt"); for (auto const &amp; value : container) { file &lt;&lt; value.to_string() &lt;&lt; '\n'; } } template&lt;typename Container&gt; void f(std::vector&lt;Class&gt; const &amp; values) { Container container; container.reserve(values.size()); for (auto const &amp; value : values) { add(container, value); } insert_to_file(container); } } int main(int argc, char ** argv) { std::size_t const size = (argc == 1) ? 1 : std::stoul(argv[1]); // Default constructor of Class fills in values here std::vector&lt;Class&gt; const values_to_be_copied(size); typedef std::vector&lt;Class&gt; Container; boost::timer timer; f&lt;Container&gt;(values_to_be_copied); std::cerr &lt;&lt; "Finished in " &lt;&lt; timer.elapsed() &lt;&lt; " seconds.\n"; } </code></pre> <p>I created a C++03 version to try and help other people reproduce it, but I cannot reproduce it in this version, despite trying to make it show the problem by making it as direct of a translation as possible:</p> <pre><code>#include &lt;algorithm&gt; #include &lt;cstdlib&gt; #include &lt;fstream&gt; #include &lt;vector&gt; #include &lt;string&gt; #include &lt;iostream&gt; #include &lt;boost/array.hpp&gt; #include &lt;boost/cstdint.hpp&gt; #include &lt;boost/lexical_cast.hpp&gt; #include &lt;boost/random/mersenne_twister.hpp&gt; #include &lt;boost/random/uniform_int_distribution.hpp&gt; #include &lt;boost/timer.hpp&gt; namespace { unsigned number() { static boost::random::mt19937 random_engine; static boost::random::uniform_int_distribution&lt;boost::uint32_t&gt; distribution(0, std::numeric_limits&lt;boost::uint32_t&gt;::max()); return distribution(random_engine); } class Class { public: Class() { x[0] = number(); } inline friend bool operator&lt;=(Class const &amp; lhs, Class const &amp; rhs) { return lhs.x[0] &lt;= rhs.x[0]; } std::string to_string() const { return boost::lexical_cast&lt;std::string&gt;(x[0]); } private: boost::array&lt;boost::uint32_t, 1&gt; x; }; class Less { public: Less(Class const &amp; c): value(c) { } bool operator()(Class const &amp; c) const { return value &lt;= c; } private: Class value; }; void add(std::vector&lt;Class&gt; &amp; container, Class const &amp; value) { std::vector&lt;Class&gt;::iterator it = std::find_if(container.begin(), container.end(), Less(value)); container.insert(it, value); } // Do something with the result void insert_to_file(std::vector&lt;Class&gt; const &amp; container) { std::fstream file("file.txt"); for (std::vector&lt;Class&gt;::const_iterator it = container.begin(); it != container.end(); ++it) { file &lt;&lt; it-&gt;to_string() &lt;&lt; '\n'; } } void f(std::vector&lt;Class&gt; const &amp; values) { std::vector&lt;Class&gt; container; container.reserve(values.size() + 1); for (std::vector&lt;Class&gt;::const_iterator it = values.begin(); it != values.end(); ++it) { add(container, *it); } insert_to_file(container); } } int main(int argc, char ** argv) { std::size_t const size = (argc == 1) ? 1 : boost::lexical_cast&lt;std::size_t&gt;(argv[1]); // Default constructor of Class fills in values here std::vector&lt;Class&gt; const values_to_be_copied(size); boost::timer timer; f(values_to_be_copied); std::cerr &lt;&lt; "Finished in " &lt;&lt; timer.elapsed() &lt;&lt; " seconds.\n"; } </code></pre> <p>The line that currently calls reserve was changed to include a <code>+ 1</code> or was completely removed, depending on which test I was running. The entire thing was run from a shell script that started at 10000 elements and increased up to 430000 elements, going one version at a time.</p> <p>My processor is an Intel i5 4-core processor, and I have 4 GiB of memory. I will try and simplify the C++11 version of the code as much as possible to see if I can isolate the problem.</p> <p>Does anyone know why reserving one more element than I need is causing this increase in speed?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload