Note that there are some explanatory texts on larger screens.

plurals
  1. PONutch crawl no error , but result is nothing
    primarykey
    data
    text
    <p>I try to crawl some urls with nutch 2.1 as follows.</p> <pre><code>bin/nutch crawl urls -dir crawl -depth 3 -topN 5 </code></pre> <p><a href="http://wiki.apache.org/nutch/NutchTutorial" rel="nofollow">http://wiki.apache.org/nutch/NutchTutorial</a></p> <p>There is no error , but undermentioned folders don't be made.</p> <pre><code>crawl/crawldb crawl/linkdb crawl/segments </code></pre> <p>Can anyone help me? I have not resolved this trouble for two days. Thanks a lot!</p> <p>output is as follows.</p> <pre><code>FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob : timelimit set for : -1 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 0 records. Hit by time limit :0 -finishing thread FetcherThread1, activeThreads=0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 -finishing thread FetcherThread2, activeThreads=7 -finishing thread FetcherThread3, activeThreads=6 -finishing thread FetcherThread4, activeThreads=5 -finishing thread FetcherThread5, activeThreads=4 -finishing thread FetcherThread6, activeThreads=3 -finishing thread FetcherThread7, activeThreads=2 -finishing thread FetcherThread0, activeThreads=1 -finishing thread FetcherThread8, activeThreads=0 -finishing thread FetcherThread9, activeThreads=0 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0.0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 ParserJob: resuming: false ParserJob: forced reparse: false ParserJob: parsing all FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob : timelimit set for : -1 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 0 records. Hit by time limit :0 -finishing thread FetcherThread1, activeThreads=0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 -finishing thread FetcherThread2, activeThreads=7 -finishing thread FetcherThread3, activeThreads=6 -finishing thread FetcherThread4, activeThreads=5 -finishing thread FetcherThread5, activeThreads=4 -finishing thread FetcherThread6, activeThreads=3 -finishing thread FetcherThread7, activeThreads=2 -finishing thread FetcherThread0, activeThreads=1 -finishing thread FetcherThread8, activeThreads=0 -finishing thread FetcherThread9, activeThreads=0 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0.0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 ParserJob: resuming: false ParserJob: forced reparse: false ParserJob: parsing all FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob : timelimit set for : -1 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 0 records. Hit by time limit :0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 -finishing thread FetcherThread9, activeThreads=9 -finishing thread FetcherThread0, activeThreads=8 -finishing thread FetcherThread1, activeThreads=7 -finishing thread FetcherThread2, activeThreads=6 -finishing thread FetcherThread3, activeThreads=5 -finishing thread FetcherThread4, activeThreads=4 -finishing thread FetcherThread5, activeThreads=3 -finishing thread FetcherThread6, activeThreads=2 -finishing thread FetcherThread7, activeThreads=1 -finishing thread FetcherThread8, activeThreads=0 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0.0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 ParserJob: resuming: false ParserJob: forced reparse: false ParserJob: parsing all </code></pre> <p>runtime/local/conf/nutch-site.xml</p> <pre><code>&lt;?xml version="1.0"?&gt; &lt;?xml-stylesheet type="text/xsl" href="configuration.xsl"?&gt; &lt;!-- Put site-specific property overrides in this file. --&gt; &lt;configuration&gt; &lt;property&gt; &lt;name&gt;http.agent.name&lt;/name&gt; &lt;value&gt;My Nutch Spider&lt;/value&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;storage.data.store.class&lt;/name&gt; &lt;value&gt;org.apache.gora.hbase.store.HBaseStore&lt;/value&gt; &lt;description&gt;Default class for storing data&lt;/description&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;http.robots.agents&lt;/name&gt; &lt;value&gt;My Nutch Spider&lt;/value&gt; &lt;description&gt;The agent strings we'll look for in robots.txt files, comma-separated, in decreasing order of precedence. You should put the value of http.agent.name as the first agent name, and keep the default * at the end of the list. E.g.: BlurflDev,Blurfl,* &lt;/description&gt; &lt;/property&gt; &lt;property&gt; &lt;name&gt;http.content.limit&lt;/name&gt; &lt;value&gt;262144&lt;/value&gt; &lt;/property&gt; &lt;/configuration&gt; </code></pre> <p>runtime/local/conf/regex-urlfilter.txt</p> <pre><code># accept anything else +. </code></pre> <p>runtime/local/urls/seed.txt</p> <pre><code>http://nutch.apache.org/ </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload