It is important to understand that SolrCloud is horizontally scalable. However, each node needs to have a certain capacity. The amount of CPU, disk, and RAM required for each node in SolrCloud needs to be figured out for the efficient allocation of resources. Though no fixed number can be assigned to these parameters as each application is unique, each index within each application has a unique indexing pattern—the number of documents that need to be indexed per second, the size of the documents, and the fields, the tokenization and the storage parameters defined. Similarly, the search patterns would also differ across indexes belonging to different applications. The number of queries per second and the search parameters can be different. The amount of data retrieved from Solr and the faceting and grouping parameters play an important role in the handling of resources used during querying.
Therefore, it is difficult to assign numbers to the RAM, CPU, and disk requirements for each node. Ideally, we should implement sharding on the size of the shard instead of the size of the collection. Routing is another very important parameter in the index. It would save a lot of network IO. The weight of a particular shard depends on the routing parameter, either in terms of the number of documents or the number of queries per second.
It is important to restrict the disk space to two to three times the size of the index. When index optimization happens, it uses up more than twice the disk space.
The ideal way to go about sizing is to put a few normal machines as the nodes of SolrCloud and monitor their resource usage. For each node, we need to monitor the following parameters:
Once these parameters are in place, nodes belonging to some shards are found to be overweight. These shards need to be split further in order to properly address scalability issues that may occur in the future.
The health of SolrCloud can be monitored via the following files or directories in the ZooKeeper server:
clusterstate.json
livenodes
We need to constantly watch the state of each core in the cluster. livenodes
provides us with a list of available nodes. If any node or core goes offline, a notification has to be sent out. Additionally, it is important to have enough replicas distributed in such a fashion that a core or a node going down should not affect the availability of the cloud. The following points need to be considered while planning out the nodes, cores, and replicas of SolrCloud:
These action points will make the SolrCloud function in an effortless manner.