Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You need to think carefully about how it should behave, especially in how it decides to crawl to another page. This code is concentrated in the <code>crawl</code> method:</p> <ol> <li><p>If <code>n &lt; 0</code>, then you have crawled deep enough and don't want to do anything. So simply return in that case.</p></li> <li><p>Otherwise, analyze the page. Then, you want to crawl to each of the new urls, with a depth of <code>n-1</code>.</p></li> </ol> <p>Part of the confusion, I think, is that you're keeping a queue of urls to visit, but also recursively crawling. For one thing, this means that queue contains not only the children of the last crawled url that you want to visit in order, but children from other nodes which were crawled but have not yet been fully processed. It's hard to manage the shape of the depth-first-search that way.</p> <p>Instead, I would remove the <code>will_visit</code> variable, and have <code>analyze</code> return a list of the found links. Then process that list according to step 2 above, something like:</p> <pre><code># Crawl this page and process its links child_urls = self.analyze(url) for u in child_urls: if u in self.visited: continue # Do nothing, because it's already been visited self.crawl(u, n-1) </code></pre> <p>For this to work you need to also change <code>analyze</code> to simply return the list of urls, rather than putting them into the stack:</p> <pre><code>def analyze(self, url): ... urls = collector.getLinks() returns urls </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload