Note that there are some explanatory texts on larger screens.

plurals
  1. POParsing Web Page's Search Results With Python
    primarykey
    data
    text
    <p>I recently started working on a program in python which allows the user to conjugate any verb easily. To do this, I am using the urllib module to open the corresponding conjugations web page. For example, the verb "beber" would have the web page:</p> <blockquote> <p>"<a href="http://www.spanishdict.com/conjugate/beber" rel="nofollow">http://www.spanishdict.com/conjugate/beber</a>"</p> </blockquote> <p>To open the page, I use the following python code:</p> <pre><code>source = urllib.urlopen("http://wwww.spanishdict.com/conjugate/beber").read() </code></pre> <p>This source does contain the information that I want to parse. But, when I make a BeautifulSoup object out of it like this:</p> <pre><code>soup = BeautifulSoup(source) </code></pre> <p>I appear to lose all the information I want to parse. The information lost when making the BeautifulSoup object usually looks something like this:</p> <pre><code>&lt;tr&gt; &lt;td class="verb-pronoun-row"&gt; yo &lt;/td&gt; &lt;td class=""&gt; bebo &lt;/td&gt; &lt;td class=""&gt; bebí &lt;/td&gt; &lt;td class=""&gt; bebía &lt;/td&gt; &lt;td class=""&gt; bebería &lt;/td&gt; &lt;td class=""&gt; beberé &lt;/td&gt; &lt;/tr&gt; </code></pre> <p>What am I doing wrong? I am no professional at Python or Web Parsing in general, so it may be a simple problem.</p> <p>Here is my complete code (I used the "++++++" to differentiate the two):</p> <pre><code>import urllib from bs4 import BeautifulSoup source = urllib.urlopen("http://www.spanishdict.com/conjugate/beber").read() soup = BeautifulSoup(source) print source print "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" print str(soup) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload