Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<h2>Comma-separated values</h2> <p>The file format you're describing is simply the CSV format. Take a tour on Wikipedia and search "Comma-separated values".</p> <p>With Python, you can use the <code>csv</code> package. Go to <a href="http://docs.python.org/2/library/csv.html" rel="nofollow">http://docs.python.org/2/library/csv.html</a> to see the documentation.</p> <p>The simplest way of writing a CSV file is as follow:</p> <pre><code>import csv records = [[1951, 'Superman and the Mole Men', 'DC Comics', 'Lee Sholem'], [1966, 'Batman', 'DC Comics', 'Leslie H. Martinson'], [2002, 'Spider-Man', 'Marvel Comics', 'Sam Raimi'], [2008, 'Iron Man', 'Marvel Comics', 'Jon Favreau']] with open('heros.csv', 'wb') as fp: writer = csv.writer(fp) writer.writerows(records) </code></pre> <p>The result is a classic CSV file:</p> <pre><code>1951,Superman and the Mole Men,DC Comics,Lee Sholem 1966,Batman,DC Comics,Leslie H. Martinson 2002,Spider-Man,Marvel Comics,Sam Raimi 2008,Iron Man,Marvel Comics,Jon Favreau </code></pre> <p>Of course, you can add the header:</p> <pre><code>with open('heros.csv', 'wb') as fp: writer = csv.writer(fp) writer.writerows([['Year', 'Film', 'Publisher', 'Director']]) writer.writerows(records) </code></pre> <p><strong>note:</strong> the header is a list of list (look at the double brackets)</p> <p>The result is the following CSV file:</p> <pre><code>Year,Film,Publisher,Director 1951,Superman and the Mole Men,DC Comics,Lee Sholem 1966,Batman,DC Comics,Leslie H. Martinson 2002,Spider-Man,Marvel Comics,Sam Raimi 2008,Iron Man,Marvel Comics,Jon Favreau </code></pre> <h2>Reading an HTML table</h2> <p>First of all, use a <code>with</code> statement to open a file in secured manner.</p> <p>For example, to read a text file, process as follow:</p> <pre><code>with open('sample.txt', 'r') as fp: content = fp.read() </code></pre> <p>That way, if an error occurs during reading, the file will be automatically closed at the end of the <code>with</code> statement, before the exception is raised. Nothing is left opened!</p> <p>To read an HTML table with <code>BeautifulSoup</code> (which I don't know), you can do:</p> <pre><code>with open("/company/a/searches/a") as html_file: soup = BeautifulSoup(html_file) rows = soup.findAll("table", {"id": "cos"}) records = [] for tr in rows: record = [] cols = tr.findAll('td') for td in cols: record.append(td.contents[0]) records.append(record) </code></pre> <p>The <code>records</code> list will contains the entire table. You can then write it into a CSV file.</p> <h2>Handling UNICODE values</h2> <p>HTML doesn't contains ASCII strings but UNICODE string and I suppose that <code>td.contents[0]</code> will return an <code>unicode</code> instance.</p> <p>But, the <code>csv</code> module doesn’t directly support reading and writing Unicode. So, you will need to write <code>unicode</code> string using <code>UTF-8</code> encoding during your CSV serialization. I recommend you to look at the <code>unicode_csv_reader()</code> function in the example: <a href="http://docs.python.org/2/library/csv.html#examples" rel="nofollow">http://docs.python.org/2/library/csv.html#examples</a>.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload