Note that there are some explanatory texts on larger screens.

plurals
  1. POLucene foreign chars problem
    primarykey
    data
    text
    <p>I'm having some serious issues using Zend_Lucene and foreign characters like åäö. These issues appear both when the index is created and when it's queried. I've tried both iso-8859-1 and utf-8.</p> <h2>ISO-8859-1</h2> <p>The query that doesn't work looks like "<code>+_area:skåne</code>". With Zend_Lucene I'm getting no matches, but if I run this query in Luke I get many matching docuements.</p> <p>The index contains 20 fields. The "_area" field is added with the following syntax:</p> <pre><code>$doc-&gt;addField(Zend_Search_Lucene_Field::keyword('_area', strtolower($item['area']), 'iso-8859-1')); </code></pre> <p>I am using the <code>Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive</code> analyzer.</p> <p>While running indexing, the error message below appeared sometimes (the documents indexed were randomly selected from DB with iso-8859-1 encoding)</p> <blockquote> <p>Notice: iconv(): Detected an illegal character in input string in TextNum.php.</p> </blockquote> <p>This was "solved" by checking if $this->_input is empty, as it seemed that this caused the notices. Note: The weird query results were a pre-existing condition.</p> <p>When I search keyword fields using foreign characters I receive the error above, but when I search text fields it behaves differently. Then it generates about a hundred of the error below.</p> <blockquote> <p>Notice: Undefined offset: 1996 in \Zend\Search\Lucene\Search\Query\MultiTerm.php on line 472</p> </blockquote> <p>But it produces what looks like a correct result set! On a side note, this second query doesn't generate any results in Luke.</p> <h2>UTF-8</h2> <p>I've also tried UTF-8 because, to my knowledge, Zend_Lucene uses it internally. Since the data set is ISO-8859-1, I convert it using <code>utf8_encode</code>. But the indexing produces the following errors.</p> <blockquote> <p>Notice: Undefined offset: 266979 in \Zend\Search\Lucene\Index\SegmentInfo.php on line 632</p> <p>Notice: Trying to get property of non-object in \Zend\Search\Lucene\Index\SegmentMerger.php on line 196</p> <p>Notice: Trying to get property of non-object in \Zend\Search\Lucene\Index\SegmentMerger.php on line 200</p> <p>Notice: Undefined index: in \Zend\Search\Lucene\Index\SegmentWriter.php on line 231</p> <p>Notice: Trying to get property of non-object in \Zend\Search\Lucene\Index\SegmentWriter.php on line 231</p> <p>Notice: Undefined offset: 250595 in \Zend\Search\Lucene\Index\SegmentInfo.php on line 2020</p> <p>Notice: Trying to get property of non-object in \Zend\Search\Lucene\Index\SegmentInfo.php on line 2020</p> <p>Notice: Undefined index: in \Zend\Search\Lucene\Index\SegmentWriter.php on line 465 ...</p> </blockquote> <hr> <p>So. Can someone please shed some light? :) I believe (after days of googling) that I'm not the only one experiencing this.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload