Note that there are some explanatory texts on larger screens.

plurals
  1. POSeeking to middle of file in python
    primarykey
    data
    text
    <p>I'm writing a program to search for a specific line in a <em>very</em> large (unordered) file (so it would be <em>preferred</em> not to load the entire file into memory). </p> <p>I'm implementing multi threading to speed up the process. I'm trying to give a particular thread a particular part of the file i.e., the first thread would run through the first quarter of the file, the 2nd thread scans (simultaneously) from the endpoint of where the first thread stops and so on. </p> <p>So to do this I need to find byte location of different parts of the file for simplicity of the question lets say I just want to find the middle of the file. But the problem is each line has a different length so if I just do</p> <pre><code>fo.seek(0, 2) end = fo.tell() mid = end/2 fo.seek(mid, 0) </code></pre> <p>It could give me the middle of the line. So I need a way to seek to the next or previous newline. Also, note I dont want the <em>exact</em> middle just somewhere around it (since its a very large file).</p> <p>Heres what I was able to code, I'm not sure whether this loads the file into memory or not. And I would really like to avoid opening 2 instances of the same file (I did so in my program because I didnt want to worry about the offset changing when I read the file). </p> <p>Any modification (or a new program) which is faster would be appreciated.</p> <pre><code>fo = open(filename, "rw+") f2 = open(filename, "rw+") file_ = dict() fo.seek(0, 2) file_['end'] = fo.tell() file_['mid'] = file_['end'] / 2 fo.seek(file_['mid'], 0) f2.seek(file_['mid'], 0) line = f2.readline() fo.seek(f2.tell(), 0) file_['mid'] = f2.tell() fo.seek(file_['mid'], 0) print fo.readline() </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload