Now that we have somewhat better search results for our e-commerce site, let us look at handling variations. What do we mean by variations? Let us take our earlier example of tommy hilfiger green sweater
. For the sake of simplicity, let's say that it comes in three sizes—small, medium, and large. Do we intend to show all three sizes in our search results as individual products? That would be a waste of the display area. If we take the example of a mobile screen, even if our top result is exactly the green sweater
we are looking at, in this scenario, it will have three products on the first screen. Instead, we could have shown some other results that may have been of interest to our customer.
Let us push in the sample data for clothes with the schema given in this chapter. Replace the schema.xml
file in the default Solr installation with that shared in this chapter and run the following command to push the data_clothes.csv
file into the Solr index:
java -Dtype=text/csv -jar solr/example/exampledocs/post.jar data_clothes.csv
The query we created earlier can be modified for tommy hilfiger green sweater
as follows:
http://localhost:8983/solr/collection1/select?q=tommy%20hilfiger%20green%20sweater&qf=text%20cat^2%20name^2%20brand^2%20clothes_type^2%20clothes_color^2%20clothes_occassion^2&pf=text%20cat^3%20name^3%20brand^3%20clothes_type^3%20clothes_color^3%20clothes_occassion^3&fl=name,brand,price,clothes_color,clothes_size,score&defType=edismax&facet=true&facet.mincount=1&facet.field=clothes_gender&facet.field=clothes_type&facet.field=clothes_size&facet.field=clothes_color&facet.field=brand&facet.field=mobile_os&facet.field=mobile_screen_size&facet.field=laptop_processor&facet.field=laptop_memory&facet.field=laptop_hard_disk
On running this query, we are getting the following output:
We can see that the first three results are exactly the same—they even have the same score. They only differ in size. Therefore, what is required is that the results that have variations in clothes_color
and clothes_size
be grouped together. However, which grouping field do we select out of the two? Should we group by color so that all greens are shown together, or should we group by size so that all medium sizes are shown together? It depends on the input the user has already selected. If the user has not selected any of color or size, it would make sense to group by color, so that different sizes of the same color
come together. On the other hand, if the user has already selected a size, we would need to add a filter query on clothes_size
to get the desired output.
In our previous query, grouping by clothes_color
will give the following output:
The Solr query will contain the following extra parameters to group by the field clothes_color
:
group=true&group.field=clothes_color
Once the user selects the size he or she is interested in, say medium
, grouping by clothes_color
after applying the clothes_size
filter query will give the following output:
We will add the following filter query to our earlier Solr query:
fq=clothes_size:medium
The same fundamentals can be used to handle variations across multiple products and categories. This feature in e-commerce is known as field collapsing. As in the previous scenario, we have given priority to color over size for any product variation. We will have to give priority to a certain aspect of the product variation. Grouping would happen on the basis of that aspect. Remaining aspects would appear as facets and will be used to filter out the results.