Книга: Apache Solr Search Patterns
Назад: Searching and filtering on a spatial index
Дальше: Advanced concepts

Distance sort and relevancy boost

During spatial search, it may be required to sort the search results on the basis of their distance from a specific geographical location (the lat-lon coordinate). With Solr 4.0, the spatial queries seen earlier are capable of returning a distance-based score for sorting and boosting.

Let us see an example wherein spatial filtering and sorting are applied and the distance is returned as the score simultaneously. Our query will be:

http://localhost:8983/solr/collection1/select/?fl=*,score&sort=score asc&q={!geofilt score=distance sfield=store pt=28.642815,77.368413 d=20}

The query output from Solr shows four results along with their scores. Our results are sorted in ascending order on score, which represents the distance as per our query. Hence, the results that are closest to our location appear on top.

The execution of the previous query yields the following output:

Distance sort and relevancy boost

In order to add user keywords to the previous Solr query, we will have to add an additional fq parameter probably with the {!edismax} filter. Moreover, we have used score=distance as the local parameter, which sets the distance to degrees relative to the center of the shape. If we don't use this parameter or set it to none value, all documents will hold the score 1.0.

In order to perform relevance boosting, we can use the recipDistance option. This option applies the reciprocal function in such a way that distance 0 achieves a score of 1 and gradually decreases as the distance increases till the score reaches 0.1 and closer to 0 for even higher distances.

Let us modify our preceding query such that it sorts the results in the same way as done for the previous query, but does not implement the spatial filter. The following will be the modified query:

fl=*,score&sort=score asc&q={!geofilt score=distance filter=false sfield=store pt=28.642815,77.368413 d=20}

The only change here is the option filter=false. This will give us all the documents in our index sorted by distance in an ascending order. If we execute this query on our index, we will get around 45 results (all documents in the index), even if they lie outside the query circle. In this case, the d option doesn't make sense as the sorting is limited to the center of the shape. However, it is still necessary to define the shape (in our case, circular) for this query. If a document doesn't have any point in the spatial field, the distance used will be equal to 0.

Let us also look at some functions provided by Solr for calculating the distance between vectors in an n-dimensional space.

dist is a function provided by Solr and can be used to calculate the distance between two vectors or points. The function definition is as follows:

dist(power, vector 1 coordinates, vector 2 coordinates)

Note the following:

  • power: This parameter takes the values 0, 1, 2, and infinite. The values 1 and 2 are important and denote the Manhattan (taxicab) distance and Euclidean distance, respectively.
  • vector 1 and vector 2 coordinates: These coordinates depend on the space in which calculations are to be done. For a two-dimensional space, vectors 1 and 2 can be (x,y) and (z,w), respectively. For a three-dimensional space, the values of vectors 1 and 2 can be (x,y,z) and (a,b,c), respectively.

Let us study some examples for calling the dist function in Solr:

  • dist(1,x,y,a,b): This function calculates the Manhattan distance between two points (a,b) and (x,y) for each document in the search result
  • dist(2,x,y,a,b): This function calculates the Euclidean distance between two points (a,b) and (x,y) for each document in the search result
  • dist(2,x,y,z,a,b,c): This function calculates the Euclidean distance between (x,y,z) and (a,b,c) for each document

In the previous examples, each letter (x,y,z,a,b,c) is a field name in the indexed document.

The dist function can be used for sorting by distance in our query. For example, the following query is intended for sorting results on the basis of the Euclidean distance between points (a,b) and (x,y) in the descending order:

http://localhost:8983/solr/collection1/select?q=*:*&sort=dist(2,a,b,x,y) desc

The dist function is a very expensive operation.

sqedist is another function provided by Solr that calculates the Euclidean distance but does not evaluate the square root of this value, thus saving additional processing time required for the dist function. This function is used for applications that require the Euclidean distance for scoring purposes (for example). Nevertheless, sqedist does not need the actual distance and can use the squared Euclidean distance. The sqedist function does not take the power as the first argument. The power is set at 2 for Euclidean distance calculation. The sqedist function, in fact, calculates the distance between two vectors. The function is defined as follows:

sqedist(vector 1 coordinates, vector 2 coordinates)

For a two-dimensional space with points (x,y) and (a,b), the function call will be:

sqedist(x,y,a,b)
Назад: Searching and filtering on a spatial index
Дальше: Advanced concepts

Solr
Testing
dosare
121