Книга: Apache Solr Search Patterns
Назад: Asynchronous calls
Дальше: Sizing and monitoring of SolrCloud

Migrating documents to another collection

Suppose we have a huge collection of over a billion documents and we get a requirement whereby we need to create a separate index with a particular set of documents, or we want to break our index into two parts on the basis of certain criteria. Migration of documents to another collection makes this possible. Effectively, we can specify a source and a destination collection in SolrCloud. On the basis of the routing criteria, certain documents will be copied from the source to the destination collection. We can specify the migration time as the forward.timeout parameter during which all write requests will be forwarded to the target collection. The target collection must not receive any writes while the migrate command is running. Otherwise, some writes may be lost.

Let us look at a practical scenario.

We currently have two collections—catcollection and mycollection. Now catcollection contains documents belonging to the categories books, currency, and electronics. Let us move the documents belonging to the category currency from catcollection to mycollection.

The query to get the documents belonging to category currency will include the shard.keys=currency! parameter:

http://solr1:8080/solr/catcollection/select/?q=*:*&rows=15&shard.keys=currency!

We can see that there are 4 documents in the collection. On querying the mycollection collection, we find that there are 35 documents in the collection. Now, let us copy the documents from catcollection to mycollection:

http://solr1:8080/solr/admin/collections?action=MIGRATE&collection=catcollection&split.key=currency!&target.collection=mycollection&forward.timeout=120

Note the following:

  • The action is MIGRATE.
  • The source collection is catcollection.
  • split.key is currency!. All documents that have currency!* as the ID will be moved to mycollection. split.key is identified by the routing parameter that we used earlier. If there is no routing parameter, split.key can be identified by the unique ID of the documents.
  • target.collection refers to the target mycollection.
  • forward.timeout is the timeout specified during which all write requests to catcollection are forwarded to mycollection.

A success message is displayed once this completes.

Migrating documents to another collection

We can see the routing parameters in the clusterstate.json file. This also includes an expiresAt parameter specifying the time after which the forwarding of requests to the target collection is stopped:

Migrating documents to another collection

Once the migration is over, the destination collection, mycollection, will contain 4 more documents, with the number totaling to 39. These documents will also be available in the source collection.

Tip

Solr collections API reference:

.

Назад: Asynchronous calls
Дальше: Sizing and monitoring of SolrCloud

Solr
Testing
dosare
121