We explored text tagging with the help of Lucene and Solr in this chapter. We understood what FSTs
are and how they are implemented in Lucene. We also went through some well-known text tagging algorithms and got a brief idea of how text tagging is implemented in Solr. We explored the SolrTextTagger
package by installing it as a module in Solr and saw some examples of text tagging using this package.
This is the last chapter in this book. In our journey throughout this book, we went through Solr indexing internals where we saw the roles of analyzers
and tokenizers
in index creation. We also saw multi-lingual search and discussed the challenges in large-scale indexing and the solutions to these problems. We then saw how Solr's scoring algorithm can be tweaked and customized. We discussed some existing algorithms and concept scoring algorithms. In the next chapter, we explored Solr internals and learnt how the relevancy scoring algorithm works on the inverted index. We delved into the query parsers available in Solr and implemented a Solr plugin for performing proximity search.
Next, we moved on to use cases, where we saw how Solr can be used for analytics and big data processing and for creating graphs. We saw an example of the use of Solr in e-commerce and discussed the relevant problems and solutions. Then, we explored the use of Solr for spatial search. We discussed in depth the geospatial search plugin available with Solr. We went through the problems faced during the implementation of Solr in an advertising system and discussed some solutions to the same.
In the advanced stage, we covered AJAX Solr, an asynchronous library available for executing queries in Solr from the browser. We discussed its features and advantages. We also went ahead and configured SolrCloud. We saw how SolrCloud addresses the problems in horizontal scalability by providing distributed indexing and search. SolrCloud can also be used as a NoSQL database. Finally, we learnt how text tagging can be performed using Solr and Lucene's FST library.