Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><code>READAHEAD_BUFSIZE</code> is <em>only</em> used when you use the file as an iterator:</p> <pre><code>for line in fileobj: print line </code></pre> <p>It is a separate buffer from the normal buffer argument, which is handled by the <code>fread</code> C API calls. Both are used when iterating.</p> <p>From <a href="http://docs.python.org/2/library/stdtypes.html#file.next" rel="nofollow"><code>file.next()</code></a>:</p> <blockquote> <p>In order to make a <code>for</code> loop the most efficient way of looping over the lines of a file (a very common operation), the <code>next()</code> method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining <code>next()</code> with other file methods (like <code>readline()</code>) does not work right. However, using <code>seek()</code> to reposition the file to an absolute position will flush the read-ahead buffer.</p> </blockquote> <p>The OS buffer size is not changed, the <code>setvbuf</code> is done when the file is opened and not touched by the file iteration code. Instead, calls to <code>Py_UniversalNewlineFread</code> (which uses <code>fread</code>) are used to fill the read-ahead buffer, creating a <em>second</em> buffer internal to Python. Python otherwise leaves the regular buffering up to the C API calls (<code>fread()</code> calls are buffered; the userspace buffer is consulted by <code>fread()</code> to satisfy the request, Python doesn't have to do anything about that).</p> <p><code>readahead_get_line_skip()</code> then serves lines (newline terminated) from this buffer. If the buffer no longer contains newlines, it'll refill the buffer by recursing over itself with a buffer size 1.25 times the previous value. This means that file iteration can read the whole rest of the file into the memory buffer if there are no more newline characters in the whole file!</p> <p>To see how much the buffer reads, print the file position (using <code>fileobj.tell()</code>) while looping:</p> <pre><code>&gt;&gt;&gt; with open('test.txt') as f: ... for line in f: ... print f.tell() ... 8192 # 1 times the buffer size 8192 8192 ~ lines elided 18432 # + 1.25 times the buffer size 18432 18432 ~ lines elided 26624 # + 1 times the buffer size; the last newline must've aligned on the buffer boundary 26624 26624 ~ lines elided 36864 # + 1.25 times the buffer size 36864 36864 </code></pre> <p>etc.</p> <p>What bytes are actually read from the disk (provided <code>fileobj</code> is an actual physical file on your disk) depend not only on the interplay between the <code>fread()</code> buffer and the internal read-ahead buffer; but also if the OS itself is using buffering. It could well be that even if the file buffer is exhausted, the OS serves the system call to read from the file from it's own cache instead of going to the physical disk.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload