Книга: Apache Solr Search Patterns
Назад: 4. Solr for Big Data
Дальше: Radius faceting for location-based data

Getting data points using facets

Let us refresh our memory about facets. Simply put, faceting refers to the method of categorizing data. A facet on a search result will contain categories and the number of documents in each category. The purpose of facets is to help the user narrow down his or her search result on the basis of some categories. Let us take an example to understand this better.

A search on mobile a phone would bring up a few of the following facets on the Amazon website:

  • Facet for Brand: We can see a facet for Brand in the following screenshot:
    Getting data points using facets

The brand facet is purely intended to help the user shortlist his or her preferences. The count of cell phones for each brand is not displayed, although this information is readily available and can be used for display.

  • Facet for display size: We can see the facet for display size in the following image:
    Getting data points using facets

The display size category shows facets based on the range of display sizes. Phones having sizes of less than 3.9 inches are grouped together. Similarly, we can see the count of phones having display sizes in the range 4 to 4.4 inches, and so on.

  • Facet for internal memory: We can see the facet for internal memory of mobile phones in the following image:
    Getting data points using facets

The facet for internal memory displays the number of cell phones having a particular size of internal memory. This categorization is based on the value of internal memory for each phone:

  • Facet for price and discount: We can see the facet for Price and Discount in the following image:
    Getting data points using facets

The facet for price is another example of range faceting where phones having prices less than $10 are grouped together. Similarly, phones costing between $10 and $25 are counted as a single category, and so on. Discount is another example of range faceting but in increasing order of discounts. Here phones having discounts of more than 10 percent are grouped together. However, this category also contains phones that have discounts of more than 25 percent.

All of the above facets can be built using three types of facet queries in Solr, field faceting, query faceting, and range faceting. Let us understand how they work.

Let us add some data into an empty Solr core. Upload the data from the ch04data.csv file provided as code with this chapter by running the following command inside the <solr_folder>/example/exampledocs folder:

 java -Durl=http://localhost:8983/solr/collection1/update -Dtype=text/csv -jar post.jar /path/to/ch04/Code/ch04data.csv 

You can run a simple query to check whether the data has been loaded into the Solr core:

http://localhost:8983/solr/collection1/select/?q=*:*

Field faceting

Field faceting retrieves the count of all terms in a specific indexed field. Field faceting is done to categorize the data on the basis of values in a specific field. We have uploaded some data related to mobiles onto our Solr core. Let us categorize the data on the basis of the different brands of phones that we have in our index and see what we get.

Field faceting is simple; just add the following parameters to the select query:

&facet=true &facet.field=brand_s

In order to facet on more than one field, add another field to the Solr query. Let us also categorize the indexes of mobile phones on the basis of their internal memory. Now the parameters in our query would be:

&facet=true &facet.field=brand_s &facet.field=memory_i

The complete Solr query is as follows:

 http://localhost:8983/solr/collection1/select/?q=*:*&facet=true&facet .field=brand_s&facet.field=memory_i

The response can be seen in the following image:

Field faceting

How does this help in handling big data? When we have huge amounts of data, field faceting can be used to retrieve information regarding different fields in the index. For example, if we are dealing with the population in a country, we can have indexes on states and cities and facets on those states and cities. This will give us an analytical output on the population in those states and cities.

Query and range faceting

Query faceting or range faceting can be used to categorize data on the basis of a particular query or a set of queries. We can create a facet similar to the discount facet that we saw on the Amazon website using query faceting. The facet would categorize the data of mobile phones with discounts of greater than 5 percent, 10 percent, 15 percent, and 20 percent. The facet containing the count of mobile phones having discounts of greater than 10 percent will include phones with 15 and 20-percent discounts. To create this facet, we will be adding the following parameters in our Solr query:

&facet=true &facet.query=discount_i:[5 TO *] &facet.query=discount_i:[10 TO *] &facet.query=discount_i:[15 TO *] &facet.query=discount_i:[20 TO *]

The complete Solr query will be as follows:

 http://localhost:8983/solr/collection1/select/?q=*:*&facet=true&facet .query=discount_i:[5 TO *]&facet.query=discount_i:[10 TO *]&facet.query=discount_i:[15 TO *]&facet.query=discount_i:[20 TO *]

The output of the query will create the following facets:

Query and range faceting

When dealing with analytical data, we would need to create complex facets, which would be a combination of a query facet and a field facet. This would help us in getting different categories out of a single query and save us the overhead of running multiple queries. When dealing with big data, it is more important to create an efficient query, as owing to the size of the data, the time required for running a single query may be huge. Therefore, it is imperative to spend more time on creation of an efficient query to get as many facets as required from a single query.

Let us create a mixed facet of price and brand and memory, somewhat similar to the one we saw on the Amazon website. The price facet will contain the count of mobiles having prices in the ranges 0 to 100, 100 to 200, 200 to 300, and more than 300. We will also be getting facet counts for the brand and the internal memory of mobile phones.

The parameters that we will be adding to the Solr query would be:

&facet=true &facet.query=price_i:[* TO 100] &facet.query=price_i:[101 TO 200] &facet.query=price_i:[201 TO 300] &facet.query=price_i:[301 TO *] &facet.field=brand_s &facet.field=memory_i

The complete query that we would run on Solr to create this complex facet will be:

http://localhost:8983/solr/collection1/select/?q=*:*&facet=true&facet .query=price_i:[* TO 100]&facet.query=price_i:[101 TO 200]&facet.query=price_i:[201 TO 300]&facet.query=price_i:[301 TO *]&facet.field=brand_s&facet.field=memory_i

The output will contain the three facets that we wanted:

Query and range faceting

In the earlier example of population of a country, we can now create multiple facets such as average income, age, and gender in addition to the simple facets of city and state. We can create a complex Solr query that contains the query for faceting on a certain income range, or another query for faceting on age range, say 0 to 3 years, 3 to 12 years, 12 to 18 years, and so on. Once we obtain this data for the entire country, we can narrow down to state and city facets by adding filter queries to our query.

A filter query, if we remember, simply adds a restriction on the actual query to provide more targeted data. Filter queries are generally added by using the fq parameter in our Solr query. Therefore, to obtain the facet counts for a city, we will be adding a filter query fq=city_name in our Solr query, and this will generate statistical counts for a particular city.

The same fundamentals can be extended to click stream analysis, which we discussed earlier. We can create facets for urls, referrers, and even different features being accessed on each URL, provided we have captured the required data in our SolrCloud.

Назад: 4. Solr for Big Data
Дальше: Radius faceting for location-based data

Solr
Testing
dosare
121