Note that there are some explanatory texts on larger screens.

plurals
  1. POParsing huge logs efficiently having sections
    text
    copied!<p>I have <strong>huge log file</strong>(in gigs) which has a heading (marking the beginning of information) and then the information follows.This is spread across the whole file randomly(meaning a heading followed by the info can occur anywhere in the log file). Also it has a parent child hierarchy as well. I need to parse and process the information of this log based on the regexes written to identify the beginning pattern of a section/heading and hence process the information which will follow.The problem here is I have to match all the regexes written for all the sections for each and every line of log file to determine which section is getting triggered . <strong>This approach is very slow and also running into issues because of not having the intelligent idea of whats coming next from the log and it amplifies when there is a hierarchy</strong>.I have thought of an approach of <strong>indexing the file</strong> by dividing file into chunks recursively (divide and conquer) and assigning it to multiple <strong>actors(scala)</strong> and each line could be matched against all the regexes in parallel(representing the beginning of sections).I want to know how efficient is the approach and <strong>would like more inputs to improve the performance</strong>.For the reference here is the pattern in which the log file may appear ::</p> <pre><code>Section1 -------------- Info for section1 .. ... .... . . Section2 -------------- Info for section2 .. ... .... . . Section3 ================= Info for section3 Child1 of section3 -------------- Info for child of section3 Child2 of section3 ---------------- Info for child of section3 Child1 of child2 which is child of section3 ......................... Info for child1 of child2 which is child of section3 Section1 -------------- //Section1 reappears Info for section1 .. ... .... . . </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload