Note that there are some explanatory texts on larger screens.

plurals
  1. PO(Python) Best way to parse a file to avoid performances issues
    text
    copied!<p>I am getting a bit of concerns about which way is the best to handle a file which has info that has to be isolated.</p> <p>As example imagine a log file, which has data divided in blocks, and each block has a list of sub blocks.</p> <p>Example of the log file:</p> <pre><code>data data data data block 1 start -sub block 1 start --data x --data y -sub block 1 end -sub block 2 start --data x --data marked as good --data z -sub block 2 end block 1 end block 1 summary block 2 start -sub block 1 start ..... -sub block 1 end .... data data data </code></pre> <p>I am looking for an efficient way to parse the bigger file (which is various mb of text), isolate the blocks and then in each block check for a specific line in the sub blocks. If the line is in the sub block, I will save the block start and end lines, where the sub block belongs, and the sub block where the line is ( but will discard the other sub blocks that does not have the data). Until I hit the end of the file.</p> <p>Example of how the results should look like:</p> <pre><code>block 1 start -sub block 2 start --data marked as good -sub block 2 end block 1 summary ..... </code></pre> <p>As now I am using this approach: I open the file, then I divide the file in smaller subset to work with; I have 3 lists that gather the info.</p> <p>the first list, called List_general, will contain the results of the parsing in the whole log file, minus what is not related to the blocks that i need to isolate. Basically after this step I will have only the blocks as in the example above, minus the "data" lines. While I do this I check for the "good data" string, so if I see that string at least once, it means that there is data that I need to process and save, otherwise I just end the function.</p> <p>If there is data to process, I go line by line in list_general and start to isolate each block and sub-blocks. starting from the first block (so from block 1 start to block 1 summary, if you look at the example).</p> <p>Once that I hit the end of a block (block 1 summary) ; if there is the data marked as good, I will start to parse it, going trough each sub block to find which one has the good data. </p> <p>I will copy line by line of each sub block, like I did for the blocks (basically starting to copy line by line from "sub block 1 start" to "sub block 1 end") and check if the good data is in that sub block. If it is I will copy the list content to the final list, otehrwise I will delete the list and start with the next sub block.</p> <p>I know that this mechanism of parsing each section is very cumbersome and expensive resource wise; so I was wondering if there is a "better" way to do this. I am pretty new to python so I am not sure how the approach to a similar issue may be faced. Hopefully someone here had a similar issue so can suggest me the best way to face this issue.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload