Note that there are some explanatory texts on larger screens.

plurals
  1. POZend Search Lucene not returning expected results
    text
    copied!<p>I've created a simple index using Zend_Search_Lucene for searching a list of company names, as I want to be able to offer a search which is more intelligent than a simple MySQL 'LIKE %query%'. I've used the code below, where 'companyname' is the company name and 'document_id' is a unique ID for each document (I'm aware that Lucene assigns one internally, but I understand that can change, whereas my document ID will be static).</p> <pre><code>$index = Zend_Search_Lucene::create('test-index'); $document = new Zend_Search_Lucene_Document(); $document-&gt;addField(Zend_Search_Lucene_Field::UnIndexed('document_id', 1)); $document-&gt;addField(Zend_Search_Lucene_Field::Text('companyname', 'XYZ Holdings')); $index-&gt;addDocument($document); $document = new Zend_Search_Lucene_Document(); $document-&gt;addField(Zend_Search_Lucene_Field::UnIndexed('document_id', 2)); $document-&gt;addField(Zend_Search_Lucene_Field::Text('companyname', 'X.Y.Z. (Holdings) Ltd')); $index-&gt;addDocument($document); $document = new Zend_Search_Lucene_Document(); $document-&gt;addField(Zend_Search_Lucene_Field::UnIndexed('document_id', 3)); $document-&gt;addField(Zend_Search_Lucene_Field::Text('companyname', 'X Y Z Ltd')); $index-&gt;addDocument($document); $index-&gt;commit(); </code></pre> <p>However, when I run the following code to find all companies with variants of 'XYZ' in their name:</p> <pre><code>$index = Zend_Search_Lucene::open('test-index'); $hits = $index-&gt;find('companyname:XYZ'); foreach ($hits as $hit) { print "ID: " . $hit-&gt;document_id . "\n"; print "Score: " . $hit-&gt;score . "\n"; print "Company: " . $hit-&gt;companyname . "\n"; } </code></pre> <p>I end up with the following:</p> <pre><code>ID: 1 Score: 1 Company: XYZ Holdings </code></pre> <p>I was expecting XYZ to match all the documents, as the point of having this search is to pick up companies which are have the same name but slightly different punctuation, which can't be catered for in a simple LIKE clause. Is there a reason why Lucene doesn't match all the documents, and is there something I can do to fix this?</p> <p>I get the same sort of problem if I search for 'companyname:"x.y.z holding"' - this doesn't match anything but 'companyname:"x.y.z holdings"' does. I'd expect Lucene to work out that 'holding' and 'holdings' are sufficiently close to be considered a match.</p> <p>I'm fairly sure all the documents are indexed because if I search for 'X.Y.Z' I get matches for documents 2 and 3.</p> <p>Edit: Forgot to mention PHP version (5.3.5-1ubuntu7.4 with Suhosin-Patch) and Zend Framework version (1.11.10-0ubuntu1).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload