Solr 4 contains three field types for spatial search: LatLonType (or its non-geodetic twin PointType), SpatialRecursivePrefixTreeFieldType (RPT for short), and BBoxField (to be introduced in Solr 4.10 onward). LatLonType has been there since Lucene 3. RPT offers more features than LatLonType and offers fast filter performance. LatLonType is more appropriate for efficient distance sorting and boosting. With Solr, we can use both the fields simultaneously—LatLonType for sorting or boosting and RPT for filtering. BBoxField is used for indexing bounding boxes, querying by a box, specifying search predicates such as Intersects
, Within
, Contains
, Disjoint
, or Equals
, and relevancy sorting or boosting of properties such as overlapRatio.
We have already seen the LatLonType field, which we used to define the location of our store in the earlier examples. Let us explore RPT and have a look at BBoxField.
RPT available in Solr 4 is used to implement the RecursivePrefixTree
search strategy. RecursivePrefixTreeStrategy
is grid- or prefix tree–based class that implements recursive descent algorithms. It is considered as the most mature strategy till date that has been tested well.
It has the following advantages over the LatLonType
field:
LatLonType
field and enables the use of geofilt
, bbox
, and geodist
query filters with RPTWe can use the RPT field in our Solr by configuring a field in our schema.xml
file of type solr.SpatialRecursivePrefixTreeFieldType
. Our schema.xml
file contains the following code for the RPT field:
<fieldType name="location_rpt" class = "solr.SpatialRecursivePrefixTreeFieldType" spatialContextFactory = "com.spatial4j.core.context.jts.JtsSpatialContextFactory" autoIndex="true" geo="true" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />
We can change the type of the field named store
from location
to location_rpt
and make it multi-valued:
<field name="store" type="location_rpt" indexed="true" stored="true" multiValued="true" />
Now restart Solr.
If you get an error java.lang.ClassNotFoundException
: com.vividsolutions.jts.geom.CoordinateSequenceFactory
, please download the JTS library (jts-1.13.jar
) from .
Now, put it in the <solr folder>/example/solr-webapp/webapp/WEB-INF/lib
path.
Let us understand the options available for the SpatialRecursivePrefixTreeFieldType
field type in our schema.xml
file:
name
: This is the name of the field type that we specified as location_rpt
.class
: This should be solr.SpatialRecursivePrefixTreeFieldType
as we have declared.spatialContextFactory
: It is specified as com.spatial4j.core.context.jts.JtsSpatialContextFactory
only when there is a requirement to implement polygons or linestrings. The JAR file jts-1.13.jar
that we put in our lib
folder (as mentioned in notes above) is used if this is specified. This context factory has its own options, which can be found if we go through the Java docs for the same. One option that we enabled in our declaration is autoIndex="true"
, which provides a major performance boost for polygons.units
: This is a mandatory parameter and currently accepts the only value as degrees. How the maxDistErr
attribute, the radius of a circle, or any other absolute distances are interpreted depends upon this parameter. One degree measures to approximately 111.2 km, which is based on the value we compute as the average radius of Earth.geo
: This parameter specifies whether the mathematical model is based on a sphere, or on Euclidean or Cartesian geometry. It is set to true
for us, so latitude and longitude coordinates will be used and the mathematical model will generally be a sphere. If set to false
, the coordinates will be generic X and Y on a two-dimensional plane having Euclidean or Cartesian geometry.WorldBounds
: It sets the valid numerical ranges of x
and y
coordinates in the minX
minY
maxX
maxY
format. In case geo="true"
, the value of this parameter is assumed to be -180
-90
180
90
; else, it needs to be specified exclusively.distCalculator
: Defines the distance calculation algorithm. If geo=true
, the haversine
value is the default. If geo=false
, the cartesian
value will be the default. Other possible values are lawOfCosines
, vincentySphere
, and cartesian^2
.The PrefixTree
based field visualizes the indexed coordinates as a grid. Each grid cell is further fragmented as another set of grid cells that falls under the next level, thus forming a hierarchy with different levels. The largest set of cells fall under level 1, the next set of fragmented cells in level 2, and so on. Here are some configuration options related to prefixTree
:
prefixTree
: Defines the spatial grid implementation. Since a PrefixTree
(such as RecursivePrefixTree
) maps the world as a grid, each grid cell is decomposed to another set of grid cells at the next level. If geo=false
, then the default prefix tree is geohash
; otherwise, it's quad
. Geohash has 32 children at each level, and quad has 4. Geohash cannot be used for geo=false
as it's strictly geospatial.distErrPct
: Defines the default precision of non-point shapes for both the index and the query as a fraction between 0.0
(fully precise) and 0.5
. The closer this number is to zero, the more accurate is the shape. We have defined it as 0.025
allowing small amounts of inaccuracy in our shape. More precise indexed shapes use more disk space and take longer to index. Bigger distErrPct
values will make querying faster but less accurate.maxDistErr
: Defines the highest level of detail required for indexed data. The default value is 1 m, a little less than 0.000009
degrees. This setting is used internally to compute an appropriate maxLevels
value.maxLevels
: Sets the maximum grid depth for indexed data. It is usually more intuitive to compute an appropriate maxLevels
by specifying maxDistErr
.We will need to clear our index and re-index the location.csv
and *.xml
files.
The data inside the Solr index for a collection can be entirely deleted using the following Solr queries:
http://localhost:8983/solr/collection1/update?stream.body=<delete/><query>*:*</query></delete> http://localhost:8983/solr/collection1/update?stream.body=<commit/>
We will study some queries employing predicates such as Intersects
, isWithin
, and others on the store field (of type RPT), which we create later in this chapter.
The BBoxField field type can be used to index a single rectangle or bounding box per document field. It supports searching via a bounding box and most spatial search predicates. It has enhanced relevancy modes based on the overlap or area between the search rectangle and the indexed rectangle.
To define it in our schema, we have to first declare a fieldType
of class solr
.BBoxField
having numberType
as defined by a separate fieldType
having the class solr
.TrieDoubleField
:
<fieldType name="bbox" class="solr.BBoxField" geo="true" units="degrees" numberType="_bbox_coord" /> <fieldType name="_bbox_coord" class="solr.TrieDoubleField" precisionStep="8" docValues="true" stored="false"/>
Now we define a field of type bbox
:
<field name="bbox" type="bbox" />
Since this feature is available in Solr 4.10 onward, we will not delve into the implementation.