In the previous chapter, we discussed in depth the problems faced during the implementation of Solr for search operations on an e-commerce website. We saw solutions to the problems and areas where optimizations may be necessary. We also took a look at semantic search and how it can be implemented in an e-commerce scenario.
In this chapter, we will explore Solr with respect to spatial search. We will look at different indexing techniques and study query types that are specific to spatial data. We will also learn different scenario-based filtering and searching techniques for geospatial data.
The topics that we will cover in this chapter are:
With Solr, we can combine location-based data with normal text data in our index. This is termed spatial search or geospatial search.
Earlier versions of Solr (Solr 3.x) provided the following features for spatial search:
geofilt
and bound box filtersgeodist
function to calculate distanceWith Solr 4, the following new features have been introduced in Solr:
Let us look at an example of storing and searching locations in Solr. We will need two fields in our Solr schema.xml
file. A field of fieldType
solr.LatLonType
named location is used along with another dynamic field named dynamicField
_coordinate
of type tdouble
as a field suffix in the previous field to index the data points:
<!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. --> <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/> <!-- Type used to index the lat and lon components for the "location" FieldType --> <dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false" />
We will have to define the field named store
of type location
, which will implement the geospatial index for the location:
<field name="store" type="location" indexed="true" stored="true"/>
Let us index a few locations into Solr and see how geospatial search works. Go into the exampledocs
folder inside the Solr installation and run the following command to index the location.csv
file provided with this chapter:
java -Dtype=text/csv -jar post.jar location.csv
Now let us see which stores are near our location. On Google Maps, we can see that our location is 28.643059, 77.368885
. Therefore, the query to figure out stores within 10 km from our location will be:
http://localhost:8983/solr/collection1/select/?q=*:*&fq={!geofilt pt=28.643059,77.368885 sfield=store d=10}
We can see that our query consists of a filter query that contains the geofilt filter that in turn looks for stores within d=10
kilometers from location pt
. We can see that there are three stores nearby in Noida
, Ghaziabad
, and East Delhi
, as per the tags associated with the latitude / longitude points.
The output of our query is shown in the following image:
In order to find more stores, we will have to change distance d
from 10
to say 30
:
http://localhost:8983/solr/collection1/select/?q=*:*&fq={!geofilt pt=28.643059,77.368885 sfield=store d=30}
This will give us stores in Rohini
and Paschim vihar
as well, which are far from the current location.
The output of this query is shown in the following image:
The JTS is an API for modeling and manipulating a two-dimensional linear geometry. It provides numerous geometric predicates and functions. It complies with the standards and provides a complete, robust, and consistent implementation of algorithms that are intended to be used to process linear geometry on a two-dimensional plane. It is fast and meant for production use.
WKT is a text mark-up language for representing vector geometry objects on a map, spatial reference systems of spatial objects, and transformations between spatial reference systems. The following geometric objects can be represented using WKT:
Spatial4j is an open source Java library that is basically intended for general-purpose spatial or geospatial requirements. Its primary responsibilities are wrapped up at three levels:
The primary strength of Spatial4j is its collection of shapes that possess the following set of capabilities:
CONTAINS
, WITHIN
, DISJOINT
, INTERSECTS
, and so on for a rectangle