Note that there are some explanatory texts on larger screens.

plurals
  1. POXPath to select an element if previous element contain a matching text() - Python, Scrapy
    primarykey
    data
    text
    <p>I want to extract an element if the previous elements text() matches specific criteria. for example,</p> <pre><code>&lt;html&gt; &lt;div&gt; &lt;table class="layouttab"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td scope="row" class="srb"&gt;General information:&amp;nbsp;&amp;nbsp;&lt;/td&gt; &lt;td&gt;(xxx) yyy-zzzz&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td scope="row" class="srb"&gt;Website:&amp;nbsp;&amp;nbsp;&lt;/td&gt; &lt;td&gt;&lt;a href="http://xyz.edu" target="_blank"&gt;http://www.xyz.edu&lt;/a&gt; &lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td scope="row" class="srb"&gt;Type:&amp;nbsp;&amp;nbsp;&lt;/td&gt; &lt;td&gt;4-year, Private for-profit&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td scope="row" class="srb"&gt;Awards offered:&amp;nbsp;&amp;nbsp;&lt;/td&gt; &lt;td&gt;Less than one year certificate&lt;br&gt;One but less than two years certificate&lt;br&gt;Associate's degree&lt;br&gt;Bachelor's degree &lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td scope="row" class="srb"&gt;Campus setting:&amp;nbsp;&amp;nbsp;&lt;/td&gt; &lt;td&gt;City: Small&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt; &lt;td scope="row" class="srb"&gt;Related Institutions:&lt;/td&gt; &lt;td&gt;&lt;a href="?q=xyz"&gt;xyz-New York&lt;/a&gt; (Parent): &lt;ul&gt; &lt;li style="list-style:circle"&gt;Berkeley College - Westchester Campus&lt;/li&gt; &lt;/ul&gt; &lt;/td&gt; &lt;/tr&gt; &lt;/tbody&gt; &lt;/table&gt; &lt;/div&gt; &lt;/html&gt; </code></pre> <p>Now, I want to extract the URL if the previous element has "Website: " in text() properties. I am using python 2.x with scrapy 0.14. I was able to extract data using individual element such as</p> <pre><code> item['Header_Type']= site.select('div/table[@class="layouttab"]/tr[3]/td[2]/text()').extract() </code></pre> <p>But this approach fails if the website parameter is missing and the tr[3] shift upward and i get 'Type' in website element and 'Awards offered' in Type.</p> <p>Is there a specific command in xPath like,</p> <pre><code>'div/table[@class="layouttab"]/tr/td[2] {if td[1] has text = "Website"} </code></pre> <p>Thanks in advance.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload