Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to only print certain text using BeautifulSoup
    primarykey
    data
    text
    <p>I am trying to pull some financial data for city governments using BeautifulSoup (had to convert the files from pdf). I just want to get the data as a csv file and then I'll analyze it in Excel or SAS. My problem is that I do not want to print the "&amp; nbsp;" that is in the original HTML, just the numbers and the row heading. Any suggestions on how I can do this without using regex?</p> <p>Below is a sample of the html I am looking at. Next is my code (currently just in proof of concept mode, need to prove I can get clean data before moving on). New to Python and programming so any help is appreciated.</p> <p></p> <p></p> <pre><code>&lt;TD class="td1629"&gt;Investments (Note 2)&lt;/TD&gt; &lt;TD class="td1605"&gt;&amp;nbsp;&lt;/TD&gt; &lt;TD class="td479"&gt;&amp;nbsp;&lt;/TD&gt; &lt;TD class="td1639"&gt;-&lt;/TD&gt; &lt;TD class="td386"&gt;&amp;nbsp;&lt;/TD&gt; &lt;TD class="td116"&gt;&amp;nbsp;&lt;/TD&gt; &lt;TD class="td1634"&gt;2,207,592&lt;/TD&gt; &lt;TD class="td479"&gt;&amp;nbsp;&lt;/TD&gt; &lt;TD class="td1605"&gt;&amp;nbsp;&lt;/TD&gt; &lt;TD class="td1580"&gt;2,207,592&lt;/TD&gt; &lt;TD class="td301"&gt;&amp;nbsp;&lt;/TD&gt; &lt;TD class="td388"&gt;&amp;nbsp;&lt;/TD&gt; &lt;TD class="td1637"&gt;2,882,018&lt;/TD&gt; </code></pre> <p></p> <p>CODE</p> <pre><code>import htmllib import urllib import urllib2 import re from BeautifulSoup import BeautifulSoup CAFR = open("C:/Users/snown/Documents/CAFR2004 BFS Statement of Net Assets.html", "r") soup = BeautifulSoup(CAFR) assets_table = soup.find(True, id="page_27").find(True, id="id_1").find('table') rows = assets_table.findAll('tr') for tr in rows: cols = tr.findAll('td') for td in cols: text = ''.join(td.find(text=True)) print text+"|", print </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload