Note that there are some explanatory texts on larger screens.

plurals
  1. PODoes the C++ standard mandate poor performance for iostreams, or am I just dealing with a poor implementation?
    primarykey
    data
    text
    <p>Every time I mention slow performance of C++ standard library iostreams, I get met with a wave of disbelief. Yet I have profiler results showing large amounts of time spent in iostream library code (full compiler optimizations), and switching from iostreams to OS-specific I/O APIs and custom buffer management does give an order of magnitude improvement.</p> <p>What extra work is the C++ standard library doing, is it required by the standard, and is it useful in practice? Or do some compilers provide implementations of iostreams that are competitive with manual buffer management?</p> <h2>Benchmarks</h2> <p>To get matters moving, I've written a couple of short programs to exercise the iostreams internal buffering:</p> <ul> <li>putting binary data into an <code>ostringstream</code> <a href="http://ideone.com/2PPYw" rel="noreferrer">http://ideone.com/2PPYw</a></li> <li>putting binary data into a <code>char[]</code> buffer <a href="http://ideone.com/Ni5ct" rel="noreferrer">http://ideone.com/Ni5ct</a></li> <li>putting binary data into a <code>vector&lt;char&gt;</code> using <code>back_inserter</code> <a href="http://ideone.com/Mj2Fi" rel="noreferrer">http://ideone.com/Mj2Fi</a></li> <li><strong>NEW</strong>: <code>vector&lt;char&gt;</code> simple iterator <a href="http://ideone.com/9iitv" rel="noreferrer">http://ideone.com/9iitv</a></li> <li><strong>NEW</strong>: putting binary data directly into <code>stringbuf</code> <a href="http://ideone.com/qc9QA" rel="noreferrer">http://ideone.com/qc9QA</a></li> <li><strong>NEW</strong>: <code>vector&lt;char&gt;</code> simple iterator plus bounds check <a href="http://ideone.com/YyrKy" rel="noreferrer">http://ideone.com/YyrKy</a></li> </ul> <p>Note that the <code>ostringstream</code> and <code>stringbuf</code> versions run fewer iterations because they are so much slower.</p> <p>On ideone, the <code>ostringstream</code> is about 3 times slower than <code>std:copy</code> + <code>back_inserter</code> + <code>std::vector</code>, and about 15 times slower than <code>memcpy</code> into a raw buffer. This feels consistent with before-and-after profiling when I switched my real application to custom buffering.</p> <p>These are all in-memory buffers, so the slowness of iostreams can't be blamed on slow disk I/O, too much flushing, synchronization with stdio, or any of the other things people use to excuse observed slowness of the C++ standard library iostream.</p> <p>It would be nice to see benchmarks on other systems and commentary on things common implementations do (such as gcc's libc++, Visual C++, Intel C++) and how much of the overhead is mandated by the standard.</p> <h2>Rationale for this test</h2> <p>A number of people have correctly pointed out that iostreams are more commonly used for formatted output. However, they are also the only modern API provided by the C++ standard for binary file access. But the real reason for doing performance tests on the internal buffering applies to the typical formatted I/O: if iostreams can't keep the disk controller supplied with raw data, how can they possibly keep up when they are responsible for formatting as well?</p> <h2>Benchmark Timing</h2> <p>All these are per iteration of the outer (<code>k</code>) loop.</p> <p>On ideone (gcc-4.3.4, unknown OS and hardware):</p> <ul> <li><code>ostringstream</code>: 53 milliseconds</li> <li><code>stringbuf</code>: 27 ms</li> <li><code>vector&lt;char&gt;</code> and <code>back_inserter</code>: 17.6 ms</li> <li><code>vector&lt;char&gt;</code> with ordinary iterator: 10.6 ms</li> <li><code>vector&lt;char&gt;</code> iterator and bounds check: 11.4 ms</li> <li><code>char[]</code>: 3.7 ms</li> </ul> <p>On my laptop (Visual C++ 2010 x86, <code>cl /Ox /EHsc</code>, Windows 7 Ultimate 64-bit, Intel Core i7, 8 GB RAM):</p> <ul> <li><code>ostringstream</code>: 73.4 milliseconds, 71.6 ms</li> <li><code>stringbuf</code>: 21.7 ms, 21.3 ms</li> <li><code>vector&lt;char&gt;</code> and <code>back_inserter</code>: 34.6 ms, 34.4 ms</li> <li><code>vector&lt;char&gt;</code> with ordinary iterator: 1.10 ms, 1.04 ms</li> <li><code>vector&lt;char&gt;</code> iterator and bounds check: 1.11 ms, 0.87 ms, 1.12 ms, 0.89 ms, 1.02 ms, 1.14 ms</li> <li><code>char[]</code>: 1.48 ms, 1.57 ms</li> </ul> <p>Visual C++ 2010 x86, with Profile-Guided Optimization <code>cl /Ox /EHsc /GL /c</code>, <code>link /ltcg:pgi</code>, run, <code>link /ltcg:pgo</code>, measure:</p> <ul> <li><code>ostringstream</code>: 61.2 ms, 60.5 ms</li> <li><code>vector&lt;char&gt;</code> with ordinary iterator: 1.04 ms, 1.03 ms</li> </ul> <p>Same laptop, same OS, using cygwin gcc 4.3.4 <code>g++ -O3</code>:</p> <ul> <li><code>ostringstream</code>: 62.7 ms, 60.5 ms</li> <li><code>stringbuf</code>: 44.4 ms, 44.5 ms</li> <li><code>vector&lt;char&gt;</code> and <code>back_inserter</code>: 13.5 ms, 13.6 ms</li> <li><code>vector&lt;char&gt;</code> with ordinary iterator: 4.1 ms, 3.9 ms</li> <li><code>vector&lt;char&gt;</code> iterator and bounds check: 4.0 ms, 4.0 ms</li> <li><code>char[]</code>: 3.57 ms, 3.75 ms</li> </ul> <p>Same laptop, Visual C++ 2008 SP1, <code>cl /Ox /EHsc</code>:</p> <ul> <li><code>ostringstream</code>: 88.7 ms, 87.6 ms</li> <li><code>stringbuf</code>: 23.3 ms, 23.4 ms</li> <li><code>vector&lt;char&gt;</code> and <code>back_inserter</code>: 26.1 ms, 24.5 ms</li> <li><code>vector&lt;char&gt;</code> with ordinary iterator: 3.13 ms, 2.48 ms</li> <li><code>vector&lt;char&gt;</code> iterator and bounds check: 2.97 ms, 2.53 ms</li> <li><code>char[]</code>: 1.52 ms, 1.25 ms</li> </ul> <p>Same laptop, Visual C++ 2010 64-bit compiler:</p> <ul> <li><code>ostringstream</code>: 48.6 ms, 45.0 ms</li> <li><code>stringbuf</code>: 16.2 ms, 16.0 ms</li> <li><code>vector&lt;char&gt;</code> and <code>back_inserter</code>: 26.3 ms, 26.5 ms</li> <li><code>vector&lt;char&gt;</code> with ordinary iterator: 0.87 ms, 0.89 ms</li> <li><code>vector&lt;char&gt;</code> iterator and bounds check: 0.99 ms, 0.99 ms</li> <li><code>char[]</code>: 1.25 ms, 1.24 ms</li> </ul> <p>EDIT: Ran all twice to see how consistent the results were. Pretty consistent IMO.</p> <p>NOTE: On my laptop, since I can spare more CPU time than ideone allows, I set the number of iterations to 1000 for all methods. This means that <code>ostringstream</code> and <code>vector</code> reallocation, which takes place only on the first pass, should have little impact on the final results.</p> <p>EDIT: Oops, found a bug in the <code>vector</code>-with-ordinary-iterator, the iterator wasn't being advanced and therefore there were too many cache hits. I was wondering how <code>vector&lt;char&gt;</code> was outperforming <code>char[]</code>. It didn't make much difference though, <code>vector&lt;char&gt;</code> is still faster than <code>char[]</code> under VC++ 2010.</p> <h2>Conclusions</h2> <p>Buffering of output streams requires three steps each time data is appended:</p> <ul> <li>Check that the incoming block fits the available buffer space.</li> <li>Copy the incoming block.</li> <li>Update the end-of-data pointer.</li> </ul> <p>The latest code snippet I posted, "<code>vector&lt;char&gt;</code> simple iterator plus bounds check" not only does this, it also allocates additional space and moves the existing data when the incoming block doesn't fit. As Clifford pointed out, buffering in a file I/O class wouldn't have to do that, it would just flush the current buffer and reuse it. So this should be an upper bound on the cost of buffering output. And it's exactly what is needed to make a working in-memory buffer.</p> <p>So why is <code>stringbuf</code> 2.5x slower on ideone, and at least 10 times slower when I test it? It isn't being used polymorphically in this simple micro-benchmark, so that doesn't explain it.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload