Note that there are some explanatory texts on larger screens.

plurals
  1. POElasticSearch: Index only the fields specified in the mapping
    primarykey
    data
    text
    <p>I have an ElasticSearch setup, receiving data to index via a CouchDB river. I have the problem that most of the fields in the CouchDB documents are actually not relevant for search: they are fields internally used by the application (IDs and so on), and I do not want to get false positives because of these fields. Besides, indexing not needed data seems to me a waste of resources.</p> <p>To solve this problem, I have defined a mapping where I specify the fields which I want to be indexed. I am using <a href="http://packages.python.org/pyes/index.html" rel="nofollow">pyes</a> to access ElasticSearch. The process that I follow is:</p> <ol> <li>Create the CouchDB river, associated to an index. This apparently creates also the index, and creates a "couchdb" mapping in that index which, as far as I can see, includes all fields, with dynamically assigned types.</li> <li>Put a mapping, restring it to the fields which I really want to index.</li> </ol> <p>This is the index definition as obtained by:</p> <pre><code>curl -XGET http://localhost:9200/notes_index/_mapping?pretty=true { "notes_index" : { "default_mapping" : { "properties" : { "note_text" : { "type" : "string" } } }, "couchdb" : { "properties" : { "_rev" : { "type" : "string" }, "created_at_date" : { "format" : "dateOptionalTime", "type" : "date" }, "note_text" : { "type" : "string" }, "organization_id" : { "type" : "long" }, "user_id" : { "type" : "long" }, "created_at_time" : { "type" : "long" } } } } } </code></pre> <p>The problem that I have is manyfold:</p> <ul> <li>that the default "couchdb" mapping is indexing all fields. I do not want this. Is it possible to avoid the creation of that mapping? I am confused, because that mapping seems to be the one which is somehow "connecting" to the CouchDB river.</li> <li>the mapping that I create seems not to have any effect: there are no documents indexed by that mapping</li> </ul> <p>Do you have any advice on this?</p> <h1>EDIT</h1> <p>This is what I am actually doing, exactly as typed:</p> <pre><code>server="localhost" # Create the index curl -XPUT "$server:9200/index1" # Create the mapping curl -XPUT "$server:9200/index1/mapping1/_mapping" -d ' { "type1" : { "properties" : { "note_text" : {"type" : "string", "store" : "no"} } } } ' # Configure the river curl -XPUT "$server:9200/_river/river1/_meta" -d '{ "type" : "couchdb", "couchdb" : { "host" : "localhost", "port" : 5984, "user" : "admin", "password" : "admin", "db" : "notes" }, "index" : { "index" : "index1", "type" : "type1" } }' </code></pre> <p>The documents in index1 still contain fields other than "note_text", which is the <em>only</em> one that I have specifically mentioned in the mapping definition. Why is that?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload