StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>After a deep look into the code I found that the search generated by haystack was:</p> <pre><code>{ "query":{ "filtered":{ "filter":{ "fquery":{ "query":{ "query_string":{ "query": "django_ct:(csi.geoname)" } }, "_cache":false } }, "query":{ "query_string":{ "query": "name_auto:(mid)", "default_operator":"or", "default_field":"text", "auto_generate_phrase_queries":true, "analyze_wildcard":true } } } }, "from":0, "size":6 } </code></pre> <p>Running this query in elasticsearch was given me as result the same 6 objects that haystack was showing...but If I added to the "query_string"</p> <pre><code>"analyzer": "standard" </code></pre> <p>it worked as desired. So the idea was to be able to setup a different search analyzer for the field.</p> <p>Based on the @user954994 answer's link and the explanation on <a href="https://stackoverflow.com/questions/15923480/elastic-search-search-analyzer-vs-index-analyzer">this post</a>, what I finally did to make it work was:</p> <ol> <li>I created my custom elasticsearch backend, adding a new custom analyzer based on the standard one.</li> <li>I added a custom EdgeNgramField, enabling the way to setup an specific analyzer for index (index_analyzer) and another analyzer for search (search_analyzer).</li> </ol> <p>So, my new settings are:</p> <pre><code>ELASTICSEARCH_INDEX_SETTINGS = { 'settings': { "analysis": { "analyzer": { "ngram_analyzer": { "type": "custom", "tokenizer": "lowercase", "filter": ["haystack_ngram"] }, "edgengram_analyzer": { "type": "custom", "tokenizer": "lowercase", "filter": ["haystack_edgengram"] }, "suggest_analyzer": { "type":"custom", "tokenizer":"standard", "filter":[ "standard", "lowercase", "asciifolding" ] }, }, "tokenizer": { "haystack_ngram_tokenizer": { "type": "nGram", "min_gram": 3, "max_gram": 15, }, "haystack_edgengram_tokenizer": { "type": "edgeNGram", "min_gram": 2, "max_gram": 15, "side": "front" } }, "filter": { "haystack_ngram": { "type": "nGram", "min_gram": 3, "max_gram": 15 }, "haystack_edgengram": { "type": "edgeNGram", "min_gram": 2, "max_gram": 15 } } } } } </code></pre> <p>My new custom build_schema method looks as follow:</p> <pre><code>def build_schema(self, fields): content_field_name, mapping = super(ConfigurableElasticBackend, self).build_schema(fields) for field_name, field_class in fields.items(): field_mapping = mapping[field_class.index_fieldname] index_analyzer = getattr(field_class, 'index_analyzer', None) search_analyzer = getattr(field_class, 'search_analyzer', None) field_analyzer = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER) if field_mapping['type'] == 'string' and field_class.indexed: if not hasattr(field_class, 'facet_for') and not field_class.field_type in('ngram', 'edge_ngram'): field_mapping['analyzer'] = field_analyzer if index_analyzer and search_analyzer: field_mapping['index_analyzer'] = index_analyzer field_mapping['search_analyzer'] = search_analyzer del(field_mapping['analyzer']) mapping.update({field_class.index_fieldname: field_mapping}) return (content_field_name, mapping) </code></pre> <p>And after rebuild index my mapping looks as below:</p> <pre><code>modelresult: { _boost: { name: "boost", null_value: 1 }, properties: { django_ct: { type: "string" }, django_id: { type: "string" }, name_auto: { type: "string", store: true, term_vector: "with_positions_offsets", index_analyzer: "edgengram_analyzer", search_analyzer: "suggest_analyzer" } } } </code></pre> <p>Now everything is working as expected!</p> <p><strong>UPDATE:</strong></p> <p>Bellow you'll find the code to clarify this part:</p> <blockquote> <ol> <li>I created my custom elasticsearch backend, adding a new custom analyzer based on the standard one.</li> <li>I added a custom EdgeNgramField, enabling the way to setup an specific analyzer for index (index_analyzer) and another analyzer for search (search_analyzer).</li> </ol> </blockquote> <p>Into my app search_backends.py:</p> <pre><code>from django.conf import settings from haystack.backends.elasticsearch_backend import ElasticsearchSearchBackend from haystack.backends.elasticsearch_backend import ElasticsearchSearchEngine from haystack.fields import EdgeNgramField as BaseEdgeNgramField # Custom Backend class CustomElasticBackend(ElasticsearchSearchBackend): DEFAULT_ANALYZER = None def __init__(self, connection_alias, **connection_options): super(CustomElasticBackend, self).__init__( connection_alias, **connection_options) user_settings = getattr(settings, 'ELASTICSEARCH_INDEX_SETTINGS', None) self.DEFAULT_ANALYZER = getattr(settings, 'ELASTICSEARCH_DEFAULT_ANALYZER', "snowball") if user_settings: setattr(self, 'DEFAULT_SETTINGS', user_settings) def build_schema(self, fields): content_field_name, mapping = super(CustomElasticBackend, self).build_schema(fields) for field_name, field_class in fields.items(): field_mapping = mapping[field_class.index_fieldname] index_analyzer = getattr(field_class, 'index_analyzer', None) search_analyzer = getattr(field_class, 'search_analyzer', None) field_analyzer = getattr(field_class, 'analyzer', self.DEFAULT_ANALYZER) if field_mapping['type'] == 'string' and field_class.indexed: if not hasattr(field_class, 'facet_for') and not field_class.field_type in('ngram', 'edge_ngram'): field_mapping['analyzer'] = field_analyzer if index_analyzer and search_analyzer: field_mapping['index_analyzer'] = index_analyzer field_mapping['search_analyzer'] = search_analyzer del(field_mapping['analyzer']) mapping.update({field_class.index_fieldname: field_mapping}) return (content_field_name, mapping) class CustomElasticSearchEngine(ElasticsearchSearchEngine): backend = CustomElasticBackend # Custom field class CustomFieldMixin(object): def __init__(self, **kwargs): self.analyzer = kwargs.pop('analyzer', None) self.index_analyzer = kwargs.pop('index_analyzer', None) self.search_analyzer = kwargs.pop('search_analyzer', None) super(CustomFieldMixin, self).__init__(**kwargs) class CustomEdgeNgramField(CustomFieldMixin, BaseEdgeNgramField): pass </code></pre> <p>My index definition goes something like:</p> <pre><code>class MyIndex(indexes.SearchIndex, indexes.Indexable): text = indexes.CharField(document=True, use_template=True) name_auto = CustomEdgeNgramField(model_attr='name', index_analyzer="edgengram_analyzer", search_analyzer="suggest_analyzer") </code></pre> <p>And finally, settings uses of course the custom backend for the haystack connection definition:</p> <pre><code>HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'my_app.search_backends.CustomElasticSearchEngine', 'URL': 'http://localhost:9200', 'INDEX_NAME': 'index' }, } </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload