Note that there are some explanatory texts on larger screens.

plurals
  1. POWhen do you start additional Elasticsearch nodes?
    primarykey
    data
    text
    <p>I'm in the middle of attempting to replace a Solr setup with Elasticsearch. This is a new setup, which has not yet seen production, so I have lots of room to fiddle with things and get them working well.</p> <p>I have very, very large amounts of data. I'm indexing some live data and holding onto it for 7 days (by using the _ttl field). I do not store any data in the index (and disabled the _source field). I expect my index to stabilize around <strong>20 billion</strong> rows. I will be putting this data into 2-3 named indexes. Search performance so far with up to a few billion rows is totally acceptable, but indexing performance is an issue. </p> <p><em>I am a bit confused about how ES uses shards internally. I have created two ES nodes, each with a separate data directory, each with 8 indexes and 1 replica. When I look at the cluster status, I only see one shard and one replica for each node. Doesn't each node keep multiple indexes running internally? (Checking the on-disk storage location shows that there is definitely only one Lucene index present).</em> -- Resolved, as my index setting was not picked up properly from the config. Creating the index using the API and specifying the number of shards and replicas has now produced exactly what I would've expected to see.</p> <p><em>Also, I tried running multiple copies of the same ES node (from the same configuration), and it recognizes that there is already a copy running and creates its own working area. These new instances of nodes also seem to only have one index on-disk.</em> -- Now that each node is actually using multiple indices, a single node with many indices is more than sufficient to throttle the entire system, so this is a non-issue.</p> <p>When do you start additional Elasticsearch nodes, for maximum indexing performance? Should I have many nodes each running with 1 index 1 replica, or fewer nodes with tons of indexes? Is there something I'm missing with my configuration in order to have single nodes doing more work? </p> <p>Also: Is there any metric for knowing when an HTTP-only node is overloaded? Right now I have one node devoted to HTTP only, but aside from CPU usage, I can't tell if it's doing OK or not. When is it time to start additional HTTP nodes and split up your indexing software to point to the various nodes?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload