Note that there are some explanatory texts on larger screens.

plurals
  1. POPerl : Unexpected behavior with website scraping
    text
    copied!<p>I'm using <code>WWW::Mechanize</code> and <code>HTML::TokeParser</code> to parse a website for updates. I cannot give any details on the website because it requires a login. The website essentially has a table of data. I'm simply parsing the html till I get to the first row of the table, check if it the value of my last scrape, if not send a mail. This works perfectly well when I test it out on existing table entries, except, when actual updates happen, the scraping doesn't stop at my last scrape. It keeps sending mails until the table is exhausted and repeats this indefinitely. I cannot figure out what is happening. I know there isn't much anyone can verify without the website but I'm posting my code anyways. I'd appreciate ideas on what could be going wrong.</p> <p>code:</p> <pre><code>sub func{ my ($comid, $mechlink) = @_; my $mechanize = WWW::Mechanize-&gt;new( noproxy =&gt; 0, stack_depth =&gt; 5, autocheck =&gt; 1 ); $mechanize-&gt;proxy( https =&gt; undef ); eval{ my $me = $mechanize-&gt;get($mechlink); $me-&gt;is_success or die $me-&gt;status_line; }; return $comid if ($@); my $stream = HTML::TokeParser-&gt;new( \$mechanize-&gt;{content} ) or die $!; while ( $tag = $stream-&gt;get_tag('td') ) { if( $tag-&gt;[1]{class} eq 'dateStamp' ) { $dt = $stream-&gt;get_trimmed_text('/td'); $tag = $stream-&gt;get_tag; $tag = $stream-&gt;get_tag; $name = $stream-&gt;get_trimmed_text('/td') if( $tag-&gt;[1]{class} eq 'Name' ); return $comid unless( $tag-&gt;[1]{class} eq 'Name' ); $tag = $stream-&gt;get_tag; $tag = $stream-&gt;get_tag; $tag = $stream-&gt;get_tag; $tag = $stream-&gt;get_tag; $info = $stream-&gt;get_trimmed_text('/td'); print "$name?\n"; return $retval if($info eq $comid); print "You've Got Mail! $info $comid\n"; $tcount++; $retval = $info if($tcount == 1); $tag = $stream-&gt;get_tag; $tag = $stream-&gt;get_tag; $tag = $stream-&gt;get_tag; $link = "http://www.abc.com".$tag-&gt;[1]{href} if ($tag-&gt;[0] eq 'a' ); my $outlook = new Mail::Outlook(); my $message = $outlook-&gt;create(); $message-&gt;To('abc@def.com'); $message-&gt;Cc('abc@def.com;abc@def.com'); my $hd = "$name - $info"; $message-&gt;Subject($hd); $message-&gt;Body(" "); $message-&gt;Attach($link); $message-&gt;send; } } } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload