Книга: Apache Solr Search Patterns
Назад: Summary
Дальше: Lucene 4 spatial module

Chapter 6. Solr for Spatial Search

In the previous chapter, we discussed in depth the problems faced during the implementation of Solr for search operations on an e-commerce website. We saw solutions to the problems and areas where optimizations may be necessary. We also took a look at semantic search and how it can be implemented in an e-commerce scenario.

In this chapter, we will explore Solr with respect to spatial search. We will look at different indexing techniques and study query types that are specific to spatial data. We will also learn different scenario-based filtering and searching techniques for geospatial data.

The topics that we will cover in this chapter are:

  • Features of spatial search
  • Lucene 4 spatial module
  • Indexing for spatial search
  • Search and filtering on spatial index
  • Distance sort and relevance boost
  • Advanced concepts
    • Quadtrees
    • Geohash

Features of spatial search

With Solr, we can combine location-based data with normal text data in our index. This is termed spatial search or geospatial search.

Earlier versions of Solr (Solr 3.x) provided the following features for spatial search:

  • Representation of spatial data as latitude and longitude in Solr
  • Filtering by geofilt and bound box filters
  • Use of the geodist function to calculate distance
  • Distance-based faceting and boosting of results

With Solr 4, the following new features have been introduced in Solr:

  • Support for new shapes: Polygon, LineString, and other new shapes are supported as indexed and query shapes in Solr 4. Shapes other than points, rectangles, and circles are supported via the Java Topology Suite (JTS), an optional dependency that we will discuss later.
  • Indexing multi-valued fields: This is critical for storing the results of automatic place extraction from text using natural language processing techniques, since a variable number of locations will be found for a single record.
  • Indexing both point and non-point shapes: Non-point shapes are essentially pixelated to a configured resolution per shape. By default, that resolution is defined by a percentage of the overall shape size, and it applies to query shapes as well. If the extremely high precision of shape edges needs to be retained for accurate indexing, then this solution probably won't scale too well during indexing because of large indexes and slow indexing. On the other hand, query shapes generally scale well to the maximum configured precision regardless of shape size.
  • Solr 4 now supports rectangles with user-specifiable corners: As discussed earlier, Solr 3 spatial only supports the bounding box of a circle.
  • Multi-valued distance sorting and score boosting: It is an unoptimized implementation as it uses large amounts of RAM.
  • Configurable precision: This is possible in Solr 4, which can vary as per shape at query time and during sorting at index time. This is mainly used for enhancing performance.
  • Fast filtering: The code outperforms the LatLonType of Solr 3 at single-valued indexed points. Also, Solr 3 LatLonType at times requires all the points to be in memory, while the new spatial module here doesn't.
  • Support for Well-known Text (WKT) via JTS: WKT: This is arguably the most widely supported textual format for shapes.

Let us look at an example of storing and searching locations in Solr. We will need two fields in our Solr schema.xml file. A field of fieldType solr.LatLonType named location is used along with another dynamic field named dynamicField _coordinate of type tdouble as a field suffix in the previous field to index the data points:

<!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->     <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/> <!-- Type used to index the lat and lon components for the "location" FieldType -->     <dynamicField name="*_coordinate"  type="tdouble" indexed="true" stored="false" />

We will have to define the field named store of type location, which will implement the geospatial index for the location:

<field name="store" type="location" indexed="true" stored="true"/>

Let us index a few locations into Solr and see how geospatial search works. Go into the exampledocs folder inside the Solr installation and run the following command to index the location.csv file provided with this chapter:

 java -Dtype=text/csv -jar post.jar location.csv 

Now let us see which stores are near our location. On Google Maps, we can see that our location is 28.643059, 77.368885. Therefore, the query to figure out stores within 10 km from our location will be:

http://localhost:8983/solr/collection1/select/?q=*:*&fq={!geofilt pt=28.643059,77.368885 sfield=store d=10}

We can see that our query consists of a filter query that contains the geofilt filter that in turn looks for stores within d=10 kilometers from location pt. We can see that there are three stores nearby in Noida, Ghaziabad, and East Delhi, as per the tags associated with the latitude / longitude points.

The output of our query is shown in the following image:

Features of spatial search

Stores within 10 km from our location point

In order to find more stores, we will have to change distance d from 10 to say 30:

http://localhost:8983/solr/collection1/select/?q=*:*&fq={!geofilt pt=28.643059,77.368885 sfield=store d=30}

This will give us stores in Rohini and Paschim vihar as well, which are far from the current location.

The output of this query is shown in the following image:

Features of spatial search

Java Topology Suite

The JTS is an API for modeling and manipulating a two-dimensional linear geometry. It provides numerous geometric predicates and functions. It complies with the standards and provides a complete, robust, and consistent implementation of algorithms that are intended to be used to process linear geometry on a two-dimensional plane. It is fast and meant for production use.

Well-known Text

WKT is a text mark-up language for representing vector geometry objects on a map, spatial reference systems of spatial objects, and transformations between spatial reference systems. The following geometric objects can be represented using WKT:

  • Points and multi-points
  • Line Segment (LineString) and multi-line segment
  • Triangle, polygon, and multi-polygon
  • Geometry
  • CircularString
  • Curve, MultiCurve, and CompoundCurve
  • CurvePolygon
  • Surface, multi-surface, and polyhedron
  • Triangulated irregular network
  • GeometryCollection

The Spatial4j library

Spatial4j is an open source Java library that is basically intended for general-purpose spatial or geospatial requirements. Its primary responsibilities are wrapped up at three levels:

  • Providing shapes that function well with Euclidean and spherical surface models
  • Calculating distance and applying other mathematical operations
  • Reading shapes from WKT strings

The primary strength of Spatial4j is its collection of shapes that possess the following set of capabilities:

  • Calculation of the latitude / longitude bounding box
  • Computation of the area of different shapes
  • Figuring out if a shape contains a given point
  • Computation of relationships such as CONTAINS, WITHIN, DISJOINT, INTERSECTS, and so on for a rectangle
Назад: Summary
Дальше: Lucene 4 spatial module

Solr
Testing
dosare
121