Note that there are some explanatory texts on larger screens.

plurals
  1. PONeed to compare very large files around 1.5GB in python
    primarykey
    data
    text
    <pre><code>"DF","00000000@11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2" "Rail","00000.POO@GMAIL.COM","NR251764697478","24JUN2011","B2C","2025" "DF","0000650000@YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792" "Bus","00009.GAURAV@GMAIL.COM","NU27012932319739","26JAN2013","B2C","800" "Rail","0000.ANU@GMAIL.COM","NR251764697526","24JUN2011","B2C","595" "Rail","0000MANNU@GMAIL.COM","NR251277005737","29OCT2011","B2C","957" "Rail","0000PRANNOY0000@GMAIL.COM","NR251297862893","21NOV2011","B2C","212" "DF","0000PRANNOY0000@YAHOO.CO.IN","NF251327485543","26JUN2011","B2C","17080" "Rail","0000RAHUL@GMAIL.COM","NR2512012069809","25OCT2012","B2C","5731" "DF","0000SS0@GMAIL.COM","NF251355775967","10MAY2011","B2C","2000" "DF","0001HARISH@GMAIL.COM","NF251352240086","22DEC2010","B2C","4006" "DF","0001HARISH@GMAIL.COM","NF251742087846","12DEC2010","B2C","1000" "DF","0001HARISH@GMAIL.COM","NF252022031180","09DEC2010","B2C","3439" "Rail","000AYUSH@GMAIL.COM","NR2151120122283","25JAN2013","B2C","136" "Rail","000AYUSH@GMAIL.COM","NR2151213260036","28NOV2012","B2C","41" "Rail","000AYUSH@GMAIL.COM","NR2151313264432","29NOV2012","B2C","96" "Rail","000AYUSH@GMAIL.COM","NR2151413266728","29NOV2012","B2C","96" "Rail","000AYUSH@GMAIL.COM","NR2512912359037","08DEC2012","B2C","96" "Rail","000AYUSH@GMAIL.COM","NR2517612385569","12DEC2012","B2C","96" </code></pre> <p>Above is the sample data. Data is sorted according to email addresses and the file is very large around 1.5Gb</p> <p>I want output in another csv file something like this</p> <pre><code>"DF","00000000@11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2",1,0 days "Rail","00000.POO@GMAIL.COM","NR251764697478","24JUN2011","B2C","2025",1,0 days "DF","0000650000@YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792",1,0 days "Bus","00009.GAURAV@GMAIL.COM","NU27012932319739","26JAN2013","B2C","800",1,0 days "Rail","0000.ANU@GMAIL.COM","NR251764697526","24JUN2011","B2C","595",1,0 days "Rail","0000MANNU@GMAIL.COM","NR251277005737","29OCT2011","B2C","957",1,0 days "Rail","0000PRANNOY0000@GMAIL.COM","NR251297862893","21NOV2011","B2C","212",1,0 days "DF","0000PRANNOY0000@YAHOO.CO.IN","NF251327485543","26JUN2011","B2C","17080",1,0 days "Rail","0000RAHUL@GMAIL.COM","NR2512012069809","25OCT2012","B2C","5731",1,0 days "DF","0000SS0@GMAIL.COM","NF251355775967","10MAY2011","B2C","2000",1,0 days "DF","0001HARISH@GMAIL.COM","NF251352240086","09DEC2010","B2C","4006",1,0 days "DF","0001HARISH@GMAIL.COM","NF251742087846","12DEC2010","B2C","1000",2,3 days "DF","0001HARISH@GMAIL.COM","NF252022031180","22DEC2010","B2C","3439",3,10 days "Rail","000AYUSH@GMAIL.COM","NR2151213260036","28NOV2012","B2C","41",1,0 days "Rail","000AYUSH@GMAIL.COM","NR2151313264432","29NOV2012","B2C","96",2,1 days "Rail","000AYUSH@GMAIL.COM","NR2151413266728","29NOV2012","B2C","96",3,0 days "Rail","000AYUSH@GMAIL.COM","NR2512912359037","08DEC2012","B2C","96",4,9 days "Rail","000AYUSH@GMAIL.COM","NR2512912359037","08DEC2012","B2C","96",5,0 days "Rail","000AYUSH@GMAIL.COM","NR2517612385569","12DEC2012","B2C","96",6,4 days "Rail","000AYUSH@GMAIL.COM","NR2517612385569","12DEC2012","B2C","96",7,0 days "Rail","000AYUSH@GMAIL.COM","NR2151120122283","25JAN2013","B2C","136",8,44 days "Rail","000AYUSH@GMAIL.COM","NR2151120122283","25JAN2013","B2C","136",9,0 days </code></pre> <p>i.e if entry occurs 1st time i need to append 1 if it occurs 2nd time i need to append 2 and likewise i mean i need to count no of occurences of an email address in the file and if an email exists twice or more i want difference among dates and remember <strong>dates are not sorted</strong> so we have to sort them also against a particular email address and i am looking for a solution in python using numpy or pandas library or any other library that can handle this type of huge data without giving out of bound memory exception i have dual core processor with centos 6.3 and having ram of 4GB</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload