Suppose we have a huge collection of over a billion documents and we get a requirement whereby we need to create a separate index with a particular set of documents, or we want to break our index into two parts on the basis of certain criteria. Migration of documents to another collection makes this possible. Effectively, we can specify a source and a destination collection in SolrCloud. On the basis of the routing criteria, certain documents will be copied from the source to the destination collection. We can specify the migration time as the forward.timeout
parameter during which all write requests will be forwarded to the target collection. The target collection must not receive any writes while the migrate command is running. Otherwise, some writes may be lost.
Let us look at a practical scenario.
We currently have two collections—catcollection
and mycollection
. Now catcollection
contains documents belonging to the categories books
, currency
, and electronics
. Let us move the documents belonging to the category currency
from catcollection
to mycollection
.
The query to get the documents belonging to category currency
will include the shard.keys=currency!
parameter:
http://solr1:8080/solr/catcollection/select/?q=*:*&rows=15&shard.keys=currency!
We can see that there are 4 documents in the collection. On querying the mycollection
collection, we find that there are 35 documents in the collection. Now, let us copy the documents from catcollection
to mycollection
:
http://solr1:8080/solr/admin/collections?action=MIGRATE&collection=catcollection&split.key=currency!&target.collection=mycollection&forward.timeout=120
Note the following:
MIGRATE
.catcollection
.split.key
is currency!
. All documents that have currency!*
as the ID will be moved to mycollection
. split.key
is identified by the routing parameter that we used earlier. If there is no routing parameter, split.key
can be identified by the unique ID of the documents.target.collection
refers to the target mycollection
.forward.timeout
is the timeout specified during which all write requests to catcollection
are forwarded to mycollection
.A success message is displayed once this completes.
We can see the routing parameters in the clusterstate.json
file. This also includes an expiresAt
parameter specifying the time after which the forwarding of requests to the target collection is stopped:
Once the migration is over, the destination collection, mycollection
, will contain 4 more documents, with the number totaling to 39. These documents will also be available in the source collection.
Solr collections API reference:
.