Note that there are some explanatory texts on larger screens.

plurals
  1. PORead CSV from within Zip File
    text
    copied!<p>I have a directory of zip files (approximately 10,000 small files), within each is a CSV file I am trying to read and split into a number of different CSV files.</p> <p>I managed to write the code to split the CSV files from a directory of CSVs, shown below, that reads the first atttribute of the CSV, and depending what it is write it to the relevent CSV.</p> <pre><code>import csv import os import sys import re import glob reader = csv.reader(open("C:/Projects/test.csv", "rb"), delimiter=',', quotechar='"') write10 = csv.writer(open('ouput10.csv', 'w'), delimiter=',', lineterminator='\n', quotechar='"', quoting=csv.QUOTE_NONNUMERIC) write15 = csv.writer(open('ouput15.csv', 'w'), delimiter=',', lineterminator='\n', quotechar='"', quoting=csv.QUOTE_NONNUMERIC) headings10=["RECORD_IDENTIFIER","CUSTODIAN_NAME","LOCAL_CUSTODIAN_NAME","PROCESS_DATE","VOLUME_NUMBER","ENTRY_DATE","TIME_STAMP","VERSION","FILE_TYPE"] write10.writerow(headings10) headings15=["RECORD_IDENTIFIER","CHANGE_TYPE","PRO_ORDER","USRN","STREET_DESCRIPTION","LOCALITY_NAME","TOWN_NAME","ADMINSTRATIVE_AREA","LANGUAGE"] write15.writerow(headings15) for row in reader: type = row[0] if "10" in type: write10.writerow(row) elif "15" in type: write15.writerow(row) </code></pre> <p>So I am now trying to read the Zip files rather than wasting time extracting them first.</p> <p>This is what I have so far after following as many tutorials as I have found</p> <pre><code>import glob import os import csv import zipfile import StringIO for name in glob.glob('C:/Projects/abase/*.zip'): base = os.path.basename(name) filename = os.path.splitext(base)[0] datadirectory = 'C:/Projects/abase/' dataFile = filename archive = '.'.join([dataFile, 'zip']) fullpath = ''.join([datadirectory, archive]) csv = '.'.join([dataFile, 'csv']) filehandle = open(fullpath, 'rb') zfile = zipfile.ZipFile(filehandle) data = StringIO.StringIO(zfile.read(csv)) reader = csv.reader(data) for row in reader: print row </code></pre> <p>However and error gets thrown </p> <p>AttributeError: 'str' object has no attribute 'reader'</p> <p>Hopefully someone can show me how to change my CSV reading code that works to read the Zip file.</p> <p>Much appreciated</p> <p>Tim</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload