Note that there are some explanatory texts on larger screens.

plurals
  1. POZend Lucene exhausts memory when indexing
    primarykey
    data
    text
    <p>An oldish site I'm maintaining uses Zend Lucene (ZF 1.7.2) as it's search engine. I recently added two new tables to be indexed, together containing about 2000 rows of text data ranging between 31 bytes and 63kB.</p> <p>The indexing worked fine a few times, but after the third run or so it started terminating with a fatal error due to exhausting it's allocated memory. The PHP memory limit was originally set to 16M, which was enough to index all other content, 200 rows of text at a few kilobytes each. I gradually increased the memory limit to 160M but it still isn't enough and I can't increase it any higher.</p> <p>When indexing, I first need to clear the previously indexed results, because the path scheme contains numbers which Lucene seems to treat as stopwords, returning every entry when I run this search:</p> <pre><code>$this-&gt;index-&gt;find('url:/tablename/12345'); </code></pre> <p>After clearing all of the results I reinsert them one by one:</p> <pre><code>foreach($urls as $v) { $doc = new Zend_Search_Lucene_Document(); $doc-&gt;addField(Zend_Search_Lucene_Field::UnStored('content', $v['data']); $doc-&gt;addField(Zend_Search_Lucene_Field::Text('title', $v['title']); $doc-&gt;addField(Zend_Search_Lucene_Field::Text('description', $v['description']); $doc-&gt;addField(Zend_Search_Lucene_Field::Text('url', $v['path']); $this-&gt;index-&gt;addDocument($doc); } </code></pre> <p>After about a thousand iterations the indexer runs out of memory and crashes. Strangely doubling the memory limit only helps a few dozen rows.</p> <p>I've already tried adjusting the MergeFactor and MaxMergeDocs parameters (to values of 5 and 100 respectively) and calling <code>$this-&gt;index-&gt;optimize()</code> every 100 rows but neither is providing consistent help.</p> <p>Clearing the whole search index and rebuilding it seems to result in a successful indexing most of the time, but I'd prefer a more elegant and less CPU intensive solution. Is there something I'm doing wrong? Is it normal for the indexing to hog so much memory?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload