Location-based data can be represented in Solr using latitudes and longitudes. Applications can combine other data with location information to provide more insight into the data pertaining to a certain location. In analytics, location-based data is very important. Whether we are dealing with sales information, statistical information of any kind, or information pertaining to visits to a website, having a location in addition to the numbers that we already have provides an additional insight with a regional perspective.
We will delve into how geospatial searches happen in Solr in , Solr for Spatial Search. For the current chapter, let us understand the different types of location filters available with Solr.
For spatial filters, the following parameters are used in Solr:
d
: Radial distance in kilometerspt
: Center point in the format of latitude and longitudesfield
: Refers to a spatial indexed fieldIn order to run queries, we would need the default documents pushed into our running Solr instance. Simply run the following command in the exampledocs
folder in your Solr installation to get these documents indexed in Solr:
java -jar post.jar *.xml
A query on the complete index will tell us that we now have around 52 records in our index:
http://localhost:8983/solr/collection1/select/?q=*:*
Different types of spatial filter queries can be defined as follows.
The geofilt filter allows us to retrieve results based on the geospatial distance from a given center point. That is, it creates a filter of a particular shape. For example, to find all documents within 5 km from a given lat
/ lon
point, we could enter the value &q=*:*&fq={!geofilt sfield=store}&pt=45.15,-93.85&d=5
.
This is shown in the following image:
Let us execute the query we have formed on our Solr index. The complete query will be:
http://localhost:8983/solr/collection1/select/?q=*:*&fq={!geofilt%20s field=store}&pt=45.15,-93.85&d=5
This gives us three records that are within 5 km from the specified lat
/ lon
position (45.15, -93.85
).
The bounding box, or bbox filter is very similar to geofilt, except that the former uses the bounding box of the calculated circle, similar to the box shown in the following diagram. It takes the same parameters as geofilt, but the rectangular shape is faster to compute. Therefore, it's sometimes used as an alternative to geofilt when it's acceptable to return points outside of the radius.
We can use the same query we ran earlier and ask for a bbox
filter instead of a geofilt
filter:
http://localhost:8983/solr/collection1/select/?q=*:*&fq={!bbox}&sfiel d=store&pt=45.15,-93.85&d=5
We need to run the following query to apply a bbox filter:
The same query now returns five results instead of three. The last two results are outside the geofilt but inside the bbox filter.
Instead of using the bbox filter, we can also run the rectangle filter, which will fetch the same result if run for a square instead of a rectangle (since the bbox filter can be run for a square only and not for a rectangle). The query for executing the rectangular filter will be as follows:
http://localhost:8983/solr/collection1/select/?q=*:*&fq=store:[45,-94 TO 46,-93]
The following image shows the area that will be used for the rectangle filter:
Solr provides a set of function queries to calculate distance during querying:
geodist
: This takes three optional parameters (sfield
, lat
, lan
) and can be used to sort results on the basis of distance. For example, to sort results by ascending distance, we would append the sort=geodist asc
parameter to our previous geofilt query.dist
: This is used to calculate the normal distance between two points on a plane surface.hsin
: This is used to calculate the distance between two points on a sphere.sqedist
: This is used to calculate the euclidean distance between two points.Euclidean distance is used to measure the distance between two lat
/ lon
coordinates more accurately. For more information, refer to the following wiki page: .
Now that we have understood in brief the different distance filter queries and function queries that can be used with location information, we can use this information to filter our search results on the basis of the radial distance from a given location.
Let us say that we need to figure out how many people visit our website via mobile phones from different regions. For this, we capture the GPS coordinates of the mobile phone, and log and index the information into our SolrCloud.
Now in order to obtain the number of mobiles accessing our website at a particular time and from a particular region, we need to create range facets, mostly using the rectangular filter, and divide our target region into different sections. A sample facet query would be as follows:
facet=true& facet.query=store:[45,-94 TO 46,-93]& facet.query=store:[45,-93 TO 46,-92]& facet.query=store:[46,-94 TO 47,-93]
If we have city names, we can facet by city and also add multiple geofilt filters to create multiple facets of regions inside the city that are of interest to us.
Another way to facet is by using the frange
filter. With frange
, we can create facets of concentric circular regions from a center point. The following query will create two facets from the center point (45.15,-93.85
). The first facet will start from the center point and go up to a radius of 5 km. The second facet will start from 5.001 km from the center point and go up to 10 km:
http://localhost:8983/solr/select?q=*:*&sfield=store&pt=45.15,- 93.85&facet.query={!frange l=0 u=5}geodist()&facet.query={!frange l=5.001 u=10}geodist()&facet=true
Analytics using location data is a very powerful tool in understanding and resolving issues that arise from location difference. Why sales do not happen well in a certain part of a city? Why is there a huge number of visits from a certain region spanning multiple cities to our website? The identification of such questions and their answers can be achieved only by indexing these data into SolrCloud and writing complex Solr queries with filters and facets. We can create facets by distance from certain store locations. We can use radius or range faceting from a store and figure out the number of sales from the different facets we have created. This can give us a deep insight into what can be done to improve the numbers that we are trying to achieve.