StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POWeb crawler: Using Perl's MozRepl module to deal with Javascript
primarykey
Id
7765537
data
AcceptedAnswerId
7767604
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2011-10-14T09:18:57.220
FavoriteCount
1
LastActivityDate
2011-10-14T12:31:06.070
LastEditDate
2011-10-14T09:25:56.793
LastEditorUserId
258334
OwnerUserId
258334
ParentId
0
PostTypeId
1
Score
4
ViewCount
1369
LastEditorDisplayName
text
Body
I am trying to save a couple of web pages by using a web crawler. Usually I prefer doing it with perl's <code>WWW::Mechanize</code> modul. However, as far as I can tell, the site I am trying to crawl has many javascripts on it which seem to be hard to avoid. Therefore I looked into the following perl modules <ul> <li><a href="http://search.cpan.org/~corion/WWW-Mechanize-Firefox-0.55/lib/WWW/Mechanize/Firefox.pm" rel="nofollow">WWW::Mechanize::Firefox</a></li> <li><a href="http://search.cpan.org/~zigorou/MozRepl-0.06/lib/MozRepl.pm" rel="nofollow">MozRepl</a></li> <li><a href="http://search.cpan.org/~corion/MozRepl-RemoteObject-0.28/lib/MozRepl/RemoteObject.pm" rel="nofollow">MozRepl::RemoteObject</a></li> </ul> The Firefox <a href="https://github.com/bard/mozrepl/wiki" rel="nofollow">MozRepl extension</a> itself works perfectly. I can use the terminal for navigating the web site just the way it is shown in the developer's tutorial - in theory. However, I have no idea about javascript and therefore am having a hard time using the moduls properly. So here is the source i like to start from: <a href="http://www.morganstanley.com/eqr/disclosures/webapp/coverage" rel="nofollow">Morgan Stanley</a> For a couple of listed firms beneath 'Companies - as of 10/14/2011' I like to save their respective pages. E.g. clicking on the first listed company (i.e. '1-800-Flowers.com, Inc') a javascript function gets called with two arguments -> <code>dtxt('FLWS.O','2011-10-14')</code>, which produces the desired new page. The page I now like to save locally. With perl's <code>MozRepl</code> module I thought about something like this: <pre><code>use strict; use warnings; use MozRepl; my $repl = MozRepl->new; $repl->setup; $repl->execute('window.open("http://www.morganstanley.com/eqr/disclosures/webapp/coverage")'); $repl->repl_enter({ source => "content" }); $repl->execute('dtxt("FLWS.O", "2011-10-14")'); </code></pre> Now I like to save the produced HTML page. So again, the desired code I like to produce should visit for a couple of firms their HTML site and simply save the web page. (Here are e.g. three firms: MMM.N, FLWS.O, SSRX.O) <ol> <li>Is it correct, that I cannot go around the page's javascript functions and therefore cannot use <code>WWW::Mechanize</code>?</li> <li>Following question 1, are the mentioned perl modules a plausible approach to take?</li> <li>And finally, if you say the first two questions can be anwsered with yes, it would be really nice if you can help me out with the actual coding. E.g. in the above code, the essential part which is missing is a <code>'save'-command</code>. (Maybe using Firefox's <code>saveDocument</code> function?)</li> </ol>
Tags
<javascript><perl><firefox>
Title
Web crawler: Using Perl's MozRepl module to deal with Javascript
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USmropa
UserOwnerUserId
1. USmropa
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POWeb crawler: Using Perl's MozRepl module to deal with Javascript
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POWeb crawler: Using Perl's MozRepl module to deal with Javascript
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POWeb crawler: Using Perl's MozRepl module to deal with Javascript
 UserUserId
 USr4.
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.