Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Another possible (system-admin) way, avoiding database and SQL queries plus a whole lot of requirements in runtime processes and hardware resources.</p> <p><strong>Update 20/04</strong> Added more code and simplified approach:-</p> <ol> <li><a href="https://stackoverflow.com/questions/6819601/convert-datetime-in-to-epoch-using-awk-piped-data">Convert the timestamp</a> to seconds (from Epoch) and use UNIX <code>sort</code>, using email and this new field (that is: <code>sort -k2 -k4 -n -t, &lt; converted_input_file &gt; output_file</code>)</li> <li>Initialize 3 variable, <code>EMAIL</code>, <code>PREV_TIME</code> and <code>COUNT</code></li> <li>Interate over each line, if new email is encountered, add "1,0 day". Update <code>PREV_TIME=timestamp</code>, <code>COUNT=1</code>, <code>EMAIL=new_email</code></li> <li>Next line: 3 possible scenario <ul> <li>a) if same email, different timestamp: calculate days, increment COUNT=1, update PREV_TIME, add "Count, Difference_in_days"</li> <li>b) If same email, same timestamp: increment COUNT, add "COUNT, 0 day"</li> <li>c) If new email, start from 3.</li> </ul></li> </ol> <p>Alternative to 1. is to add a new field TIMESTAMP and remove it upon printing out the line.</p> <p>Note: If 1.5GB is too huge to sort at a go, split it into smaller chuck, using email as the split point. You can run these chunks in parallel on different machine</p> <pre><code>/usr/bin/gawk -F'","' ' { split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " "); for (i=1; i&lt;=12; i++) mdigit[month[i]]=i; print $0 "," mktime(substr($4,6,4) " " mdigit[substr($4,3,3)] " " substr($4,1,2) " 00 00 00" )}' &lt; input.txt | /usr/bin/sort -k2 -k7 -n -t, &gt; output_file.txt </code></pre> <p>output_file.txt:</p> <blockquote> <p>"DF","00000000@11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2",1280102400 "DF","0001HARISH@GMAIL.COM","NF252022031180","09DEC2010","B2C","3439",1291852800 "DF","0001HARISH@GMAIL.COM","NF251742087846","12DEC2010","B2C","1000",1292112000 "DF","0001HARISH@GMAIL.COM","NF251352240086","22DEC2010","B2C","4006",1292976000<br> ...</p> </blockquote> <p>You pipe the output to Perl, Python or AWK script to process step 2. through 4.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload