Chapter 3: High-Density Line Sampling - Power BI: 3 in 1- Comprehensive Guide of Tips and Tricks to Learn the Functions of Power BI+ Simple and Effective Strategies+ Advanced Guide to Learn the Advanced Realms of Power BI - Jones, Daniel - RutLib.com

Since June 2017 onwards with the release of Power BI Desktop and important version updates for Power BI service, you now have a new line sampling algorithm in Power BI. It improves the visuals in Power BI, which are having high-density data. Let us say you have created a line chart using the sales figures from your retail stores, and every individual store has more than lakhs of sales invoice throughout the year. Therefore, a line chart for such data will be sampling the data into meaningful information to showcase each store’s data as well as create multi-line charts to display underlying data. It is a common algorithm used to visualize high-density data in Power BI. It improves the sampling of data in an efficient manner.

Earlier Power BI used to select the collection of sample data points from the total range of underlying data with the help of the deterministic approach. Let us suppose that a high-density data for a year has around 300 sample data points shown on the visual. Each sample data has been selected to make sure that the total range of data gets represented. In order to understand this, for example, there you have plotted the stock market price for a one-year duration. It will have 365 data points creating the visual line chart, i.e., 1 data point for each day’s stock price.

In this particular scenario, the stock price for the company will have many values during the share market timing for that day. The stock price changes every second, and thus there is a daily high level as well as the low level for the stock. It can be at any time during the trading hours. In high-density line sampling, suppose the data sample is taken at 9:30 AM and 2:00 PM every day. But it might not have the real high level and low level of that particular day in that time period. Therefore, sampling will not be able to capture the relevant points, which in this case was the daily high level and low level of the stock price.

In terms of definition, high-density data needs to be sampled for creating quick visualizations that are responsive and interactive in nature. Having more than required data points in a visual can decrease the trend visibility and get diverted from its actual reason for creation. Therefore, how to perform data sampling was the need of the hour, and a sampling algorithm was created in order to provide real-time visualization to the users automatically. The algorithm is a combination of best response, data representation, and up to data sampling of data points in a slice.

It is used for line chart visuals and area chart visuals on an X-axis model. Power BI’s new algorithm is intelligent enough to slice down your data in the form of high-resolution chunks. Then it will automatically pick the important data points for representing every chunk in the visual. The process of slicing is tuned in such a manner that the final chart is almost alike and identical to all underlying data points in a fast and interactive mode. There are restrictions on the maximum and minimum values to qualify for high-density visuals.

The maximum number of important data points allowed to be displayed is 3500 irrespective of total underlying points. For example, you are having 20 series of 175 data points each. The visual has now already reached the maximum allowed data point limit, i.e., 175 * 20 = 3500. For only one series, you can have 3500 data points for the new high-density line sampling algorithm.
The maximum number of series allowed for a visual is 60. For example, you have greater than 60 series; then, you will have to cut off the data and make fewer series. You should use a slicer as a general practice for showing the data segments. By using a slicer, you can easily filter out the all-inclusive category for the same page report.

Note: The below visuals have higher limits than the 3500 data points.

In the case of R visuals, the maximum data points limit is 1.5 Lakh.
In the case of Power BI visuals, the maximum data points limit is 30,000.
In the case of scatter charts, the maximum data points limit is 10,000.
For all other visuals, the maximum data points limit is 3500.

These limits are enforced in Power BI Desktop to make sure that visuals get rendered speedily as well as are interactive in nature. They should not result in excessive computation load to render the visual in Power BI.

As soon as the underlying data points are more than the maximum allowed limit to represent it in the visual form, the binning process starts automatically. It starts chunking the underlying data into various groups known as bins, followed by filtering them one by one.

The new algorithm in Power BI starts creating as many bins as it can in order to make the largest granularity in visual. For every bin, it will find the maximum and minimum data values so that the significant and major values gets shown in visual. On the basis of results acquired by binning and further advanced data processing, value for a minimum resolution of the X-axis gets decided. This is to make sure that Power BI has maximum granularity in the process.

Every bin has 2 data points inside it, which are the representative points for visual. They are the maximum and minimum values, which means that important values get captured in the Power BI visual. This might look very advanced in terms of complexity, but that is why it has been created to help the users in solving the high-density data problem.

One thing worth noting here is about the binning process in which maximum and minimum values are shown. It definitely affects the process of how tooltips will show data after you are hovering the mouse on those data points. In order to understand this, let’s revisit the stock price example once again.

For example, you have created a visual to compare two different stocks, and both of them have high-density sampling data. You will be capturing the stock price every second, which in turn creates a huge amount of data points for each series. The Power BI algorithm will be performing the binning process for every series without affecting the other.

The first stock price has increased at 11:15 AM, and then it has moved down again after 30 seconds. It is a major data point in the visual. In the binning process, the stock will be at high at 11:15 AM to represent the data point. Coming to the second stock, its price was neither high nor low at 11:15 AM to be included in the bin. We can say that both the high and the low values for that stock came after 20 minutes.

So, in this kind of situation, after the line chart gets completed, you have hovered the mouse at 11:15 AM. You find a value for the first stock at that time because it was a high point at that time, but you won’t find any value for the second stock in the tooltip. It is because there was nothing to show at that time in the tooltip. These situations will regularly arise when you use tooltips. The maximum and minimum values for one bin might not match with the X-axis data points, and this will affect the tooltip to show no results.

Note: The new algorithm is turned ON by default in Power BI. If you wish to alter this default setting, you need to go to “Formatting” pane. Then navigate to the “General” card. At the bottom of the screen, there will be an option named “High-Density Sampling.” Slide it towards the left to turn it off.

There is no doubt that the new algorithm has significantly improved high data sampling in Power BI, but at the same time, there are some limitations that you need to know. Limitations are mentioned as below:

Tooltips are affected due to the binning process and high granularity. They might only display value if and only if data points are aligned with the cursor.
If the complete data source is too large, the algorithm will automatically eliminate the legend series elements in order to fit in the maximum data constraint. The series is arranged in alphabetical order until the maximum limit is reached, and it will discard the remaining series from Power BI processing.
If there are more than 60 series, then the algorithm will first order them in alphabetical order and remove the ones after the 60 th ordered numbered series.
All the data needs to be in either numeric or date/time format. The algorithm will not consider any other type of value, and it will be processed by a normal algorithm, which is non-high density in nature.
“Items having no data” is not supported in the algorithm.
It is not possible to use this algorithm while having a live connection with SQL Server 2016 or later version. Though it will be supporting live connections in Power BI and Azure services.