Elasticsearch is being used in a lot of companies as a great search-engine thanks to its speed and scale. At Dynamic Yield, we are using Elasticsearch as part of our recommendations engine and handle thousands of requests per second.
Elasticsearch cluster contains several nodes that can play one or more roles: master, data, ingestion, etc.
Indices are being stored in the data nodes.
Each index can have one or more shards, usually determined by the index size.
Each shard holds part of the index data (documents), so if for example, we have an index with two shards, each shard holds ~50% of the data.
That means that if one of those two shards is unavailable for some reason (disk or node failure, data corruption, etc) — half of the data will be unavailable, and this is something we definitely want to prevent in our production environment.
Here’s where (shard) replicas came in: each shard can have zero or more replicas, which is simply a copy of that shard. That way, in case a shard is unavailable there is always another copy that can be used for search/query.
Before we proceed, there are a few Elasticsearch principles to keep in mind:
- Elasticsearch never holds more than one replica of the same shard on the same node.
- Elasricseach is trying to balance all shards between all data nodes (both primary and replica shards). This is done by dividing the total number of shards by the number of data nodes (and not by their size! there’s a great post reviewing that issue). Say we have 1,000 shards and 8 data nodes, 125 shards will be allocated to each node.
- If a data node goes down for some reason, all of its shards will become “unassigned shards” and Elasticseach will try to assign them on different nodes (by duplicating other replicas of those shards). During that time, the cluster state might be red or yellow.
One cool and important feature called Adaptive Replica Selection (introduced in version 6.1) ensures that all those replicas will be used in a smart way: Elasticsearch will use the replica available on the most unloaded node to answer the query fastest as possible.
So we have at least two reasons for having replicas:
- High-Availability: if one replica becomes unavailable, there are other copies to cover it.
- Latency: the more replicas we have the more nodes can handle our search queries.
Sounds great. Why not having all shards in all nodes? can each index have replicas across all nodes?
The answer is yes, but let's break it down: in case a node leaves the cluster for some reason, Elasticsearch might freak out. All shards of this node will become “unassigned shards” and Elasticseach will try to solve this problem by assigning those shards to another node, but it will never succeed due to the 1st concept above; each node already holds a replica of these shards.
That is why we are setting our replica_factor to N-2 (where N is the number of data nodes in the cluster). In case of unexpected failure (that most likely will happen in the middle of the night), Elasticseach will be able to recover and assign those unassigned shards to nodes that don’t have copies of those shards yet.
But what if we have a planned restart? we can count some reasons for that:
- Node replacement, physically or just upgrading node instance-type.
- Node termination (for changing availability-zone for example).
- Changing Elasticsearch’s static settings.
- Unexpected behavior such as high JVM heap usage that requires Elasticsearch process restart.
So we have a simple way and the safe way.
The simple way
Just stop Elasticsearch service:
service elasticsearch stop
Or in a more aggressive way, just terminate the node.
In this case, Elasticsearch health status will become yellow or red and Elasticsearch will mark all node shards as “unassigned shards”. At this point, it will try to recover by assigning the shards wherever it can.
But what about all those in-flight queries that this node handled? The production cluster can handle thousands of queries per second and we just stopped it in the middle of processing. What will happen to all those queries? We don’t want to lose even a single query if we can prevent it!
The safe way
Elasticsearch provides us a rich REST API, so why not use it?
We can tell Elasticsearch to relocate all shards from the node we want to take down, and only after the node is empty we will safely terminate it. Then, when the node is stopped there will be no impact. Finally, after bringing this node back to life or adding a fresh new node we can simply remove the exclusion command. Simple, isn’t it? Here are the steps:
- Move all shards from the data-node by shard allocation filtering:
2. Verify that the node allocation has no shards by calling
or by calling:
You can see that shards allocation increase in all other data nodes.
3. Stop the Elasticsearch service
service elasticsearch stop
4. Bring up the new data node (or just start the existing one).
service elasticsearch start
5. Remove the allocation exclusion (by setting an empty string):
That’s it! After the new node joined the cluster Elasticsearch will rebalance the shards allocation again across all the available data nodes.
We saw how we can perform a safe restart to Elasticsearch node with minimal risk and a small additional effort. That way we ensure our production environment is safe and stable.
If you have questions, please feel free to ask them here.