StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POthumbnail screenshot of webpages with Perl :: Mechanize
text
Body
copied!<p>i use WWW::Mechanize::Firefox to control a firefox instance and dump the rendered page with $mech->content_as_png.</p> <p><strong>New update</strong>: see at the end of the initial posting: thanks to user1126070 we have a new solution - which i want to try out later the day [right now i am in office and not at home - in front of the machine with the programme ]</p> <pre><code>$mech->repl->repl->setup_client( { extra_client_args => { timeout => 5*60 } } ); </code></pre> <p>i try out the version that <code>put links to @list and use eval</code> and do the following:</p> <pre><code>while (scalar(@list)) { my $link = pop(@list); print "trying $link\n"; eval{ $mech->get($link); sleep (5); my $png = $mech->content_as_png(); my $name = "$_"; $name =~s/^www\.//; $name .= ".png"; open(OUTPUT, ">$name"); print OUTPUT $png; close(OUTPUT); } if ($@){ print "link: $link failed\n"; push(@list,$link);#put the end of the list next; } print "$link is done!\n"; } </code></pre> <p><strong>BTW:</strong> user1126070 what with the trimming down the images to thumbnail-size. Should i use imager here. Can you suggest some solution thing here...!? That would be great. </p> <p><strong>end of Update</strong></p> <p>Here the problem-outline continues - as written at the very <strong>beginning of this Q & A</strong></p> <p><strong>problem-outline:</strong> I have a list of 2500 websites and need to grab a thumbnail screenshot of them. How do I do that? I could try to parse the sites either with Perl.- Mechanize would be a good thing. Note: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension. At the moment i have a solution which is slow and does not give back thumbnails: How to make the script running faster with less overhead - spiting out the thumbnails </p> <p>But i have to be aware that setting it up can pose quite a challenge, though. If all works as expected, you can simply use a script like this to dump images of the desired websites, but you should start Firefox and resize it to the desired width manually (height doesn't matter, WWW::Mechanize::Firefox always dumps the whole page).</p> <p>What i have <strong>done so far</strong> is alot - i work with mozrepl. At the moment i struggle with timeouts: Is there a way to specify Net::Telnet timeout with WWW::Mechanize::Firefox? At the moment my internet connection is very slow and sometimes I get error</p> <pre><code>with $mech->get(): command timed-out at /usr/local/share/perl/5.12.3/MozRepl/Client.pm line 186 </code></pre> <p>SEE THIS ONE: </p> <pre><code>> $mech->repl->repl->timeout(100000); </code></pre> <p>Unfortunatly it does not work: Can't locate object method "timeout" via package "MozRepl" Documentation says this should:</p> <pre><code>$mech->repl->repl->setup_client( { extra_client_args => { timeout => 1 +80 } } ); </code></pre> <p>What i have tried allready; here it is:</p> <pre><code>#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize::Firefox; my $mech = new WWW::Mechanize::Firefox(); open(INPUT, "<urls.txt") or die $!; while (<INPUT>) { chomp; print "$_\n"; $mech->get($_); my $png = $mech->content_as_png(); my $name = "$_"; $name =~s/^www\.//; $name .= ".png"; open(OUTPUT, ">$name"); print OUTPUT $png; sleep (5); } </code></pre> <p>Well this does not care about the size: See the output commandline:</p> <pre><code>linux-vi17:/home/martin/perl # perl mecha_test_1.pl www.google.com www.cnn.com www.msnbc.com command timed-out at /usr/lib/perl5/site_perl/5.12.3/MozRepl/Client.pm line 186 linux-vi17:/home/martin/perl # </code></pre> <p>And here - this is my source: see a snippet-example of the sites i have in the url-list.</p> <p>urls.txt - the list of sources </p> <pre><code>www.google.com www.cnn.com www.msnbc.com news.bbc.co.uk www.bing.com www.yahoo.com and so on... </code></pre> <p><strong>BTW:</strong> With that many url's we have to expect that some will fail and handle that. For example, we put the failed ones in an array or hash and retry them X times.</p> <p>UTSL</p> <p>well how is this one here...</p> <pre><code> sub content_as_png { my ($self, $tab, $rect) = @_; $tab ||= $self->tab; $rect ||= {}; # Mostly taken from # http://wiki.github.com/bard/mozrepl/interactor-screenshot-server my $screenshot = $self->repl->declare(<<'JS'); function (tab,rect) { var browser = tab.linkedBrowser; var browserWindow = Components.classes['@mozilla.org/appshell/window-mediator;1'] .getService(Components.interfaces.nsIWindowMediator) .getMostRecentWindow('navigator:browser'); var win = browser.contentWindow; var body = win.document.body; if(!body) { return; }; var canvas = browserWindow .document .createElementNS('http://www.w3.org/1999/xhtml', 'canvas'); var left = rect.left || 0; var top = rect.top || 0; var width = rect.width || body.clientWidth; var height = rect.height || body.clientHeight; canvas.width = width; canvas.height = height; var ctx = canvas.getContext('2d'); ctx.clearRect(0, 0, width, height); ctx.save(); ctx.scale(1.0, 1.0); ctx.drawWindow(win, left, top, width, height, 'rgb(255,255,255)'); ctx.restore(); //return atob( return canvas .toDataURL('image/png', '') .split(',')[1] // ); } JS my $scr = $screenshot->($tab, $rect); return $scr ? decode_base64($scr) : undef }; </code></pre> <p>Love to hear from you! greetings zero</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload