Skip to content

Commit 4457385

Browse files
committed
Changed order of sections to be more logical flow, readded the youtube links, added data stream section
Added shard overhead and fixed replica description
1 parent b804b4a commit 4457385

File tree

1 file changed

+17
-8
lines changed

1 file changed

+17
-8
lines changed

docs/reference/how-to/size-your-shards.asciidoc

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,29 @@
11
[[size-your-shards]]
22
== Size your shards
3-
4-
Proper shard sizing is crucial for maintaining the performance and stability of an {es} cluster. _Oversharding_ occurs when data is distributed across an excessive number of shards, which can degrade search performance and make the cluster unstable. Conversely, very large shards may slow down search operations and prolong recovery times after failures. To strike the right balance, the <<shard-size-recommendation,general guidelines>> are to aim for shard sizes between 10GB and 50GB, keeping the per-shard document count below 200 million. To ensure that each node is working optimally, it's important to distribute shards evenly across nodes. Having uneven shard distribution will make some nodes work harder than others leading to performance degradation and potential instability.
5-
6-
Despite these general guidelines, it is good to develop a tailored <<create-a-sharding-strategy, sharding strategy>> that considers your specific infrastructure, use case, and performance expectations.
7-
83
[discrete]
94
[[what-is-a-shard]]
105
=== What is a shard?
116

12-
A shard is a basic unit of storage in {es}. Every index is divided into one or more shards to help distribute data and workload across nodes in a cluster. This division allows {es} to handle large datasets and perform operations like searches and indexing efficiently. Here’s how shards work:
13-
7+
A shard is a basic unit of storage in {es}. Every index is divided into one or more shards to help distribute data and workload across nodes in a cluster. This division allows {es} to handle large datasets and perform operations like searches and indexing efficiently but not without cost. Each index and shard has some overhead and if you divide your data across too many shards then the overhead will degrade performance. Shards play several key roles in {es}:
148

159
* *Data Distribution:* Each shard contains a portion of the data from the index. When you add more nodes to your cluster, {es} will spread the shards across the nodes, balancing the workload between them.
16-
* *Replication:* Shards can have replicas which are copies of the original shard. Replicas ensure data availability and improve search performance by allowing multiple nodes to handle requests.
10+
* *Replication:* Shards can have replicas which are copies of the original shard. Replicas ensure data availability and improve search performance by allowing multiple nodes to handle requests for that shard.
1711
* *Parallel Processing:* Shards enable {es} to process queries in parallel across nodes, making searches faster and more efficient.
1812

1913
By effectively using shards, {es} can scale horizontally and provide fault tolerance, ensuring your data is distributed and queries are processed efficiently.
2014

15+
[discrete]
16+
[[sizing-shard-guidelines]]
17+
=== Sizing Shard Guidelines
18+
19+
Proper shard sizing is crucial for maintaining the performance and stability of an {es} cluster. _Oversharding_ occurs when data is distributed across an excessive number of shards, which can degrade search performance and make the cluster unstable. Conversely, very large shards may slow down search operations and prolong recovery times after failures.
20+
21+
To strike the right balance, the <<shard-size-recommendation,general guidelines>> are to aim for shard sizes between 10GB and 50GB, keeping the per-shard document count below 200 million. To ensure that each node is working optimally, it's important to distribute shards evenly across nodes. Having uneven shard distribution will make some nodes work harder than others leading to performance degradation and potential instability.
22+
23+
If you are using <<data-streams>>, each data stream consists of multiple backing indices, each with its own set of shards. Proper shard planning for these indices is essential to maintaining performance and stability.
24+
25+
Despite these general guidelines, it is good to develop a tailored <<create-a-sharding-strategy, sharding strategy>> that considers your specific infrastructure, use case, and performance expectations.
26+
2127
[discrete]
2228
[[create-a-sharding-strategy]]
2329
=== Create a sharding strategy
@@ -213,6 +219,7 @@ index can be <<indices-delete-index,removed>>. You may then consider setting
213219
<<indices-add-alias,Create Alias>> against the destination index for the source
214220
index's name to point to it for continuity.
215221

222+
See this https://www.youtube.com/watch?v=sHyNYnwbYro[fixing shard sizes video] for an example troubleshooting walkthrough.
216223

217224
[discrete]
218225
[[shard-count-recommendation]]
@@ -576,6 +583,8 @@ PUT _cluster/settings
576583
}
577584
----
578585

586+
See this https://www.youtube.com/watch?v=tZKbDegt4-M[fixing "max shards open" video] for an example troubleshooting walkthrough. For more information, see <<troubleshooting-shards-capacity-issues,Troubleshooting shards capacity>>.
587+
579588
[discrete]
580589
[[troubleshooting-max-docs-limit]]
581590
==== Number of documents in the shard cannot exceed [2147483519]

0 commit comments

Comments
 (0)