Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>According to your comments I see that you have crawled something before and this is why your Nutch starts to crawl Wikipedia. </p> <p>When you crawl something with Nutch it records some metada at a table (if you use Hbase it is a table named webpage) When you finish a crawling and start a new one that table is scanned and if there is a record that has a metada says "this record can be fetched again because next fetch time is passed" Nutch starts to fetch that urls and also your new urls.</p> <p>So if you want to have just <a href="http://www.tigerdirect.com/" rel="nofollow">http://www.tigerdirect.com/</a> crawled at your system you have to clean up that table first. If you use Hbase start shell:</p> <pre><code>./bin/hbase shell </code></pre> <p>and disable table:</p> <pre><code>disable 'webpage' </code></pre> <p>and finally drop it:</p> <pre><code>drop 'webpage' </code></pre> <p>I could truncate that table but removed it.</p> <p>Next thing is putting that into your seed.txt:</p> <pre><code>http://www.tigerdirect.com/ </code></pre> <p>open regex-urlfilter.txt that is located at:</p> <pre><code>nutch/runtime/local/conf </code></pre> <p>write that line into it:</p> <pre><code>+^http://([a-z0-9]*\.)*www.tigerdirect.com/([a-z0-9]*\.)* </code></pre> <p>you will put that line instead of <code>+.</code></p> <p>I have indicated to crawl subdomains of tigerdirect, it is up to you.</p> <p>After that you can send it into solr to index and make a search on it. I have tried it and works however you may have some errors at Nutch side but it is another topic to talk about.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload