Note that there are some explanatory texts on larger screens.

plurals
  1. POextract text from website source code
    primarykey
    data
    text
    <p>I want to extract info from an website link:</p> <pre><code>http://www.website.com </code></pre> <p>There is a string that appears few times: "STRING TO CAPTURE", but I want to capture the FIRST time appears. It will be inside the following structure: </p> <pre><code>&lt;td width="10%" bgcolor="#FFFFFF"&gt;&lt;font class="bodytext9"&gt;1-Jun-2013&lt;/font&gt;&lt;/td&gt; &lt;td width="4%" bgcolor="#FFFFFF" align=center&gt;&lt;font class="bodytext9"&gt;Sat&lt;/font&gt;&lt;/td&gt; &lt;td width="4%" bgcolor="#FFFFFF" align="center"&gt;&lt;font class="bodytext9"&gt;TIME&lt;/font&gt;&lt;/td&gt; &lt;td width="15%" bgcolor="#FFFFFF" align="center"&gt;&lt;a class="black_9" href="link1"&gt;Some Text here&lt;/a&gt;&lt;/td&gt; &lt;td width="5%" bgcolor="#FFFFFF" align="center"&gt;&lt;font class="bodytext9"&gt;&lt;img src="img/colors/pink.gif"&gt;&lt;/font&gt;&lt;/td&gt; &lt;td width="5%" bgcolor="#FFFFFF" align="center"&gt;&lt;/td&gt; &lt;td width="5%" bgcolor="#FFFFFF" align="center"&gt;&lt;font class="bodytext9"&gt;Another Text&lt;/font&gt;&lt;/td&gt; &lt;td width="5%" bgcolor="#FFFFFF" align="center"&gt;&lt;/td&gt; &lt;td width="5%" bgcolor="#FFFFFF" align="center"&gt;&lt;font class="bodytext9"&gt;&lt;img src="img/colors/white.gif"&gt;&lt;/font&gt;&lt;/td&gt; &lt;td width="15%" bgcolor="#FFFFFF" align="center"&gt;&lt;a class="black_9" href="link2"&gt;Here is also Text&lt;/a&gt;&lt;/td&gt; &lt;td width="15%" bgcolor="#FFFFFF" align="center"&gt;&lt;a href="LINKtoWeb" class=list&gt;&lt;u&gt;STRING TO CAPTURE&lt;/u&gt;&lt;/a&gt;&lt;/td&gt; &lt;td width="4%" bgcolor="#FFFFFF" align="center"&gt;&lt;a target="_new" href="AnotherLink"&gt;&lt;img src="img/img2.gif" border="0"&gt;&lt;/a&gt;&lt;/td&gt; &lt;/tr&gt; </code></pre> <p>This is a fix format, where between the is 12 lines start with and all other tags; I want to extract the text in each line, eg.</p> <pre><code>1-Jun-2013 Sat TIME Some Text here ... STRING TO CAPTURE </code></pre> <p>and I also want to extract the link at line contain "STRING TO CAPTURE" which is:</p> <pre><code>LINKtoWeb </code></pre> <p>In my opinion, python could be very functional to do this task, but I also too new to python to get it works, hope python experts here can show me how. I have no idea where to start, search around and find this could be solution:</p> <pre><code>use YAML; my $data = Load(http://www.website.com); say $data-&gt;{"&lt;tr&gt;"}-&gt;{"&lt;td&gt;"}-&gt;{"STRING TO CAPTURE"}; </code></pre> <p>But I don't know how to deal with all the texts in these 12 lines ?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload