Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>mmap is <em>way</em> faster. You might write a simple benchmark to prove it to yourself:</p> <pre><code>char data[0x1000]; std::ifstream in("file.bin"); while (in) { in.read(data, 0x1000); // do something with data } </code></pre> <p>versus:</p> <pre><code>const int file_size=something; const int page_size=0x1000; int off=0; void *data; int fd = open("filename.bin", O_RDONLY); while (off &lt; file_size) { data = mmap(NULL, page_size, PROT_READ, 0, fd, off); // do stuff with data munmap(data, page_size); off += page_size; } </code></pre> <p>Clearly, I'm leaving out details (like how to determine when you reach the end of the file in the event that your file isn't a multiple of <code>page_size</code>, for instance), but it really shouldn't be much more complicated than this.</p> <p>If you can, you might try to break up your data into multiple files that can be mmap()-ed in whole instead of in part (much simpler). </p> <p>A couple of months ago I had a half-baked implementation of a sliding-window mmap()-ed stream class for boost_iostreams, but nobody cared and I got busy with other stuff. Most unfortunately, I deleted an archive of old unfinished projects a few weeks ago, and that was one of the victims :-(</p> <p><strong>Update</strong>: I should also add the caveat that this benchmark would look quite different in Windows because Microsoft implemented a nifty file cache that does most of what you would do with mmap in the first place. I.e., for frequently-accessed files, you could just do std::ifstream.read() and it would be as fast as mmap, because the file cache would have already done a memory-mapping for you, and it's transparent.</p> <p><strong>Final Update</strong>: Look, people: across a lot of different platform combinations of OS and standard libraries and disks and memory hierarchies, I can't say for certain that the system call <code>mmap</code>, viewed as a black box, will always always always be substantially faster than <code>read</code>. That wasn't exactly my intent, even if my words could be construed that way. <strong>Ultimately, my point was that memory-mapped i/o is generally faster than byte-based i/o; this is still true</strong>. If you find experimentally that there's no difference between the two, then the only explanation that seems reasonable to me is that your platform implements memory-mapping under the covers in a way that is advantageous to the performance of calls to <code>read</code>. The only way to be absolutely certain that you're using memory-mapped i/o in a portable way is to use <code>mmap</code>. If you don't care about portability and you can rely on the particular characteristics of your target platforms, then using <code>read</code> may be suitable without sacrificing measurably any performance.</p> <p><strong>Edit to clean up answer list:</strong> @jbl:</p> <blockquote> <p>the sliding window mmap sounds interesting. Can you say a little more about it?</p> </blockquote> <p>Sure - I was writing a C++ library for Git (a libgit++, if you will), and I ran into a similar problem to this: I needed to be able to open large (very large) files and not have performance be a total dog (as it would be with <code>std::fstream</code>).</p> <p><code>Boost::Iostreams</code> already has a mapped_file Source, but the problem was that it was <code>mmap</code>ping whole files, which limits you to 2^(wordsize). On 32-bit machines, 4GB isn't big enough. It's not unreasonable to expect to have <code>.pack</code> files in Git that become much larger than that, so I needed to read the file in chunks without resorting to regular file i/o. Under the covers of <code>Boost::Iostreams</code>, I implemented a Source, which is more or less another view of the interaction between <code>std::streambuf</code> and <code>std::istream</code>. You could also try a similar approach by just inheriting <code>std::filebuf</code> into a <code>mapped_filebuf</code> and similarly, inheriting <code>std::fstream</code> into <code>a mapped_fstream</code>. It's the interaction between the two that's difficult to get right. <code>Boost::Iostreams</code> has some of the work done for you, and it also provides hooks for filters and chains, so I thought it would be more useful to implement it that way.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload