Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I translate this XPath expression to BeautifulSoup?
    primarykey
    data
    text
    <p>In answer to a <a href="https://stackoverflow.com/questions/1813921/how-to-search-a-html-page-for-an-item-in-a-given-list/1814616#1814616">previous question</a>, several people suggested that I use <a href="http://www.crummy.com/software/BeautifulSoup/" rel="nofollow noreferrer">BeautifulSoup</a> for my project. I've been struggling with their documentation and I just cannot parse it. Can somebody point me to the section where I should be able to translate this expression to a BeautifulSoup expression?</p> <pre><code>hxs.select('//td[@class="altRow"][2]/a/@href').re('/.a\w+') </code></pre> <p>The above expression is from <a href="http://scrapy.org/" rel="nofollow noreferrer">Scrapy</a>. I'm trying to apply the regex <code>re('\.a\w+')</code> to <code>td class altRow</code> to get the links from there.</p> <p>I would also appreciate pointers to any other tutorials or documentation. I couldn't find any.</p> <p>Thanks for your help.</p> <p><b>Edit:</b> I am looking at this <a href="http://www.whitecase.com/Attorneys/List.aspx?LastName=&amp;FirstName=" rel="nofollow noreferrer">page</a>:</p> <pre><code>&gt;&gt;&gt; soup.head.title &lt;title&gt;White &amp; Case LLP - Lawyers&lt;/title&gt; &gt;&gt;&gt; soup.find(href=re.compile("/cabel")) &gt;&gt;&gt; soup.find(href=re.compile("/diversity")) &lt;a href="/diversity/committee"&gt;Committee&lt;/a&gt; </code></pre> <p>Yet, if you look at the page source <code>"/cabel"</code> is there:</p> <pre><code> &lt;td class="altRow" valign="middle" width="34%"&gt; &lt;a href='/cabel'&gt;Abel, Christian&lt;/a&gt; </code></pre> <p>For some reason, search results are not visible to BeautifulSoup, but they are visible to XPath because <code>hxs.select('//td[@class="altRow"][2]/a/@href').re('/.a\w+')</code> catches "/cabel"</p> <p><b>Edit:</b> cobbal: It is still not working. But when I search this:</p> <pre><code>&gt;&gt;&gt;soup.findAll(href=re.compile(r'/.a\w+')) [&lt;link href="/FCWSite/Include/styles/main.css" rel="stylesheet" type="text/css" /&gt;, &lt;link rel="shortcut icon" type="image/ico" href="/FCWSite/Include/main_favicon.ico" /&gt;, &lt;a href="/careers/northamerica"&gt;North America&lt;/a&gt;, &lt;a href="/careers/middleeastafrica"&gt;Middle East Africa&lt;/a&gt;, &lt;a href="/careers/europe"&gt;Europe&lt;/a&gt;, &lt;a href="/careers/latinamerica"&gt;Latin America&lt;/a&gt;, &lt;a href="/careers/asia"&gt;Asia&lt;/a&gt;, &lt;a href="/diversity/manager"&gt;Diversity Director&lt;/a&gt;] &gt;&gt;&gt; </code></pre> <p>it returns all the links with second character "a" but not the lawyer names. So for some reason those links (such as "/cabel") are not visible to BeautifulSoup. I don't understand why.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload