Note that there are some explanatory texts on larger screens.

plurals
  1. POMost efficient way to parse a large .csv in python?
    primarykey
    data
    text
    <p>I tried to look on other answers but I am still not sure the right way to do this. I have a number of really large .csv files (could be a gigabyte each), and I want to first get their column labels, cause they are not all the same, and then according to user preference extract some of this columns with some criteria. Before I start the extraction part I did a simple test to see what is the fastest way to parse this files and here is my code:</p> <pre><code>def mmapUsage(): start=time.time() with open("csvSample.csv", "r+b") as f: # memory-mapInput the file, size 0 means whole file mapInput = mmap.mmap(f.fileno(), 0) # read content via standard file methods L=list() for s in iter(mapInput.readline, ""): L.append(s) print "List length: " ,len(L) #print "Sample element: ",L[1] mapInput.close() end=time.time() print "Time for completion",end-start def fileopenUsage(): start=time.time() fileInput=open("csvSample.csv") M=list() for s in fileInput: M.append(s) print "List length: ",len(M) #print "Sample element: ",M[1] fileInput.close() end=time.time() print "Time for completion",end-start def readAsCsv(): X=list() start=time.time() spamReader = csv.reader(open('csvSample.csv', 'rb')) for row in spamReader: X.append(row) print "List length: ",len(X) #print "Sample element: ",X[1] end=time.time() print "Time for completion",end-start </code></pre> <p>And my results:</p> <pre><code>======================= Populating list from Mmap List length: 1181220 Time for completion 0.592000007629 ======================= Populating list from Fileopen List length: 1181220 Time for completion 0.833999872208 ======================= Populating list by csv library List length: 1181220 Time for completion 5.06700015068 </code></pre> <p>So it seems that the csv library most people use is really alot slower than the others. Maybe later it proves to be faster when I start extracting data from the csv file but I cannot be sure for that yet. Any suggestions and tips before I start implementing? Thanks alot!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload