Architecture of an ad distribution system

Now that we have a brief overview of the functionalities provided by an advertising system, we can look at the architecture of the advertising system and understand where Solr fits in the picture.

The system would receive parameters such as placement of the ad, keywords related to the ad, and the type of ad to be displayed. On the basis of these parameters, the system will identify the ad to be displayed. Most of the data required for ad display is stored as a browser cookie on an end user's system. This cookie can contain tracking and targeting information. This cookie information is sent over to the ad distribution network and is used for identifying the ad to be displayed and also for gathering the tracking and behavioral information.

The ad system generally works on JSON, HTML, and JavaScript frameworks on the frontend. JavaScript is used on the client side and is placed on the web page on which the ad is to be displayed. JavaScript handles all the communication between the ad distribution network and the browser on which the ad is to be displayed. Data is generally shared between the ad distribution network and the JavaScript client on the browser in the JSON format.

On the backend, the ad distribution system requires searching, filtering, sorting, and logging of data. This is where Solr comes into the picture. Ad distribution systems are high-performance and high-availability systems. Each web page can contain multiple ads, and there are various web pages on multiple sites on which ads are to be displayed. Hence, the number of requests per second for an ad distribution network will be much higher than the total number of page views on all the sites that cater to ads from this ad distribution network. Also, 100 percent availability is required, as downtime not only leads to loss of revenue but it also brings down the credibility of the ad distribution system.

Logs are collected and analyzed to improve profitability. There are various technologies used on the back end, from databases such as MySQL and Mongo and caching systems such as Redis, to web servers such as Nginx or Apache and of course Solr for search. A demo system architecture for an ad distribution system is shown in the following screenshot:

Architecture of an ad distribution system

Ad distribution system architecture

We can see that there is a load balancer that, on receiving requests, distributes them to back-end web servers. Here we are using Apache as the web or HTTP server. The HTTP server forwards requests to the Apache Tomcat application server, which contains the business logic. The application server interacts with the redis cache to get cached information. If it does not find the required information in the cache, it uses the MySQL database to fetch the information and caches this information on the redis server. If the request is related to targeting or search, the application server checks redis for the cache and gets information from Solr Slave, if that information is not found in the redis cache. The indexing happens on the Solr Master, which can be scheduled at certain intervals, and the updated index is replicated onto Solr Slave.

We have high availability of web, application, caching, and Solr servers. The database master and the Solr indexing server can be organized in a master-master arrangement to achieve high availability at that level. The aim here should be to achieve a no single point of failure scenario.

Note

We have included the Solr slave, Redis cache, and the application on the web server itself. Therefore, each server acts as an independent node behind the load balancer. This reduces the internal network bandwidth and simplifies the number of moving parts. However, if a single server cannot host all the parts required for the application, it is recommended to move or spread them out and balance the load between internal hosts using a load balancer. In such a scenario, we would have a cluster of Redis slaves behind a load balancer, a cluster of Solr slaves, and so on.

We may consider replacing the Apache web server with the event-driven Nginx web server, which will be able to handle more requests. The Nginx server is a lightweight event-based server, unlike the Apache web server, and it can handle more connections. The Solr master-slave architecture can be replaced with SolrCloud, which provides better indexing performance and higher availability of Solr slaves. This will be discussed in , SolrCloud. Currently, in order to update the Solr schema, the following process needs to be followed:

Stop replication between the master and slave servers.
Remove one or more web or application servers from the load balancer disabling all requests on that server.
Update the Solr schema on the master server.
Replicate it onto the Solr slaves that have been removed from the load balancer and that do not serve any requests.
Update the application on the machines that have been removed from the load balancer.
Put the updated machines back into the load balancer and remove all the other machines from the load balancer.
Replicate the Solr slave on the remaining machines that are out of the load balancer.
Update the application on the remaining machines.
Put the remaining machines back into the load balancer after Solr replication is complete.

Use of SolrCloud simplifies the entire process of updating the schema as this approach does not require such extensive planning and manual intervention. SolrCloud uses a centralized configuration system known as ZooKeeper, which acts as a referral point for schema updates. We will be discussing the same in , SolrCloud.