Note that there are some explanatory texts on larger screens.

plurals
  1. POpython- is beautifulsoup misreporting my html?
    text
    copied!<p>I have two machines each, to the best of my knowledge, running python 2.5 and BeautifulSoup 3.1.0.1. </p> <p>I'm trying to scrape <a href="http://utahcritseries.com/RawResults.aspx" rel="nofollow noreferrer">http://utahcritseries.com/RawResults.aspx</a>, using:</p> <pre><code>from BeautifulSoup import BeautifulSoup import urllib2 base_url = "http://www.utahcritseries.com/RawResults.aspx" data=urllib2.urlopen(base_url) soup=BeautifulSoup(data) i = 0 table=soup.find("table",id='ctl00_ContentPlaceHolder1_gridEvents') #table=soup.table print "begin table" for row in table.findAll('tr')[1:10]: i=i + 1 col = row.findAll('td') date = col[0].string event = col[1].a.string confirmed = col[2].string print '%s - %s' % (date, event) print "end table" print "%s rows processed" % i </code></pre> <p>On my windows machine,I get the correct result, which is a list of dates and event names. On my mac, I don't. instead, I get</p> <pre><code>3/2/2002 - Rocky Mtn Raceway Criterium None - Rocky Mtn Raceway Criterium 3/23/2002 - Rocky Mtn Raceway Criterium None - Rocky Mtn Raceway Criterium 4/2/2002 - Rocky Mtn Raceway Criterium None - Saltair Time Trial 4/9/2002 - Rocky Mtn Raceway Criterium None - DMV Criterium 4/16/2002 - Rocky Mtn Raceway Criterium </code></pre> <p>What I'm noticing is that when I </p> <pre><code>print row </code></pre> <p>on my windows machine, the tr data looks exactly the same as the source html. Note the style tag on the second table row. Here's the first two rows:</p> <pre><code>&lt;tr&gt; &lt;td&gt; 3/2/2002 &lt;/td&gt; &lt;td&gt; &lt;a href="Event.aspx?id=226"&gt; Rocky Mtn Raceway Criterium &lt;/a&gt; &lt;/td&gt; &lt;td&gt; Confirmed &lt;/td&gt; &lt;td&gt; &lt;a href="Event.aspx?id=226"&gt; Points &lt;/a&gt; &lt;/td&gt; &lt;td&gt; &lt;a disabled="disabled"&gt; Results &lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;tr style="color:#333333;background-color:#EFEFEF;"&gt; &lt;td&gt; 3/16/2002 &lt;/td&gt; &lt;td&gt; &lt;a href="Event.aspx?id=227"&gt; Rocky Mtn Raceway Criterium &lt;/a&gt; &lt;/td&gt; &lt;td&gt; Confirmed &lt;/td&gt; &lt;td&gt; &lt;a href="Event.aspx?id=227"&gt; Points &lt;/a&gt; &lt;/td&gt; &lt;td&gt; &lt;a disabled="disabled"&gt; Results &lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; </code></pre> <p>On my mac when I print the first two rows, the style information is removed from the tr tag and it's moved into each td field. I don't understand why this is happening. I'm getting None for every other date value, because BeautifulSoup is putting a font tag around every other date. Here's the mac's output:</p> <pre><code>&lt;tr&gt; &lt;td&gt; 3/2/2002 &lt;/td&gt; &lt;td&gt; &lt;a href="Event.aspx?id=226"&gt; Rocky Mtn Raceway Criterium &lt;/a&gt; &lt;/td&gt; &lt;td&gt; Confirmed &lt;/td&gt; &lt;td&gt; &lt;a href="Event.aspx?id=226"&gt; Points &lt;/a&gt; &lt;/td&gt; &lt;td&gt; &lt;a disabled="disabled"&gt; Results &lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;tr bgcolor="#EFEFEF"&gt; &lt;td&gt; &lt;font color="#333333"&gt; 3/16/2002 &lt;/font&gt; &lt;/td&gt; &lt;td&gt; &lt;font color="#333333"&gt; &lt;a href="Event.aspx?id=227"&gt; Rocky Mtn Raceway Criterium &lt;/a&gt; &lt;/font&gt; &lt;/td&gt; &lt;td&gt; &lt;font color="#333333"&gt; Confirmed &lt;/font&gt; &lt;/td&gt; &lt;td&gt; &lt;font color="#333333"&gt; &lt;a href="Event.aspx?id=227"&gt; Points &lt;/a&gt; &lt;/font&gt; &lt;/td&gt; &lt;td&gt; &lt;font color="#333333"&gt; &lt;a disabled="disabled"&gt; Results &lt;/a&gt; &lt;/font&gt; &lt;/td&gt; &lt;/tr&gt; </code></pre> <p>My script is displaying the correct result under windows-what do I need to do in order to get my Mac to work correctly?</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload