Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>After digging through the source a bit more and trying to understand more how <code>setvbuf</code> and <code>fread</code> work, I think I understand how <code>buffering</code> and <code>READAHEAD_BUFSIZE</code> relate to each other: when iterating through a file, a buffer of <code>READAHEAD_BUFSIZE</code> is filled on each line, but filling this buffer uses calls to <code>fread</code>, each of which fills a buffer of <code>buffering</code> bytes.</p> <p>Python's <code>read</code> is implemented as <a href="http://hg.python.org/cpython/file/84cd07899baf/Objects/fileobject.c#l1052" rel="nofollow">file_read</a>, which calls <a href="http://hg.python.org/cpython/file/84cd07899baf/Objects/fileobject.c#l2809" rel="nofollow">Py_UniversalNewlineFread</a>, passing it the number of bytes to read as <code>n</code>. <code>Py_UniversalNewlineFread</code> then eventually calls <code>fread</code> to read n bytes.</p> <p>When you iterate over a file, the function <a href="http://hg.python.org/cpython/file/84cd07899baf/Objects/fileobject.c#l2269" rel="nofollow">readahead_get_line_skip</a> is what retrieves a line. This function also calls <code>Py_UniversalNewlineFread</code>, passing <code>n = READAHEAD_BUFSIZE</code>. So this eventually becomes a call to <code>fread</code> for <code>READAHEAD_BUFSIZE</code> bytes.</p> <p>So now the question is, how many bytes does <code>fread</code> actually read from disk. If I run the following code in C, then 1024 bytes get copied into <code>buf</code> and 512 into <code>buf2</code>. (This might be obvious but never having used <code>setvbuf</code> before it was a useful experiment for me.)</p> <pre><code>FILE *f = fopen("test.txt", "r"); void *buf = malloc(1024); void *buf2 = mallo(512); setvbuf(f, buf, _IOFBF, 1024); fread(buf2, 512, 1, f); </code></pre> <p>So, finally, this suggests to me that when iterating over a file, <em>at least</em> <code>READAHEAD_BUF_SIZE</code> bytes are read from disk, but it might be more. I think that the first iteration of <code>for line in f</code> will read x bytes, where x is the smallest multiple of <code>buffering</code> that is greater than <code>READAHEAD_BUF_SIZE</code>.</p> <p>If anyone can confirm that this is what's actually going on, that would be great!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload