Книга: Power BI: 3 in 1- Comprehensive Guide of Tips and Tricks to Learn the Functions of Power BI+ Simple and Effective Strategies+ Advanced Guide to Learn the Advanced Realms of Power BI
Назад: Chapter Two: Introduction to Power Pivot
Дальше: Chapter Four: DAX in Practice

 

 

 

At this point, before tackling a subject, which is certainly the one with the highest degree of complexity and at the same time the one that contributes the most value to our business analysis, we would like to take a little break from the evolution of the learning that we are doing daily, getaway like a bird and take an overview of the road we have traveled and the one we have yet to travel. To do this, we will review the development cycle of a Personal BI solution:

We start by identifying a Business Process and documenting it. From there, we design the Dimensional Model that will allow us to answer the business questions. As this model is empty (it still does not contain information), we must design an ETL (Extraction, Transformation, and Loading) process that incorporates the information into the model. From this point on, and depending on the tool we are using (in our case, Power Pivot), we must make the improvements we consider appropriate to the model.

We have already studied and practiced everything mentioned above with different real case studies. So we have two more phases to discover: Enrich the information through the DAX expression language and Perform the Visualization layer . Once put in context, we will focus on Power Pivot, and we will also see the road we have traveled and what remains to be done.

With what we have seen so far, we have made the dimensional model, which is the basis of our entire solution, we have created the ETL process (either with Power Query or Power Pivot) that has allowed us to load the information in our storage in Power Pivot, we have been able to improve the model (data types, relationships, hierarchies, etc.) and it is time to enrich the model through DAX.

DAX (Data Analysis Expressions) is a language of expressions that we can find inside Power Pivot, and it is based on formulas that allow defining personalized calculations. It consists of a library of functions and operators that can be combined to generate formulas and expressions. It is characterized by its simplicity of syntax and its similarity to Excel formulas and has even inherited many functions with the same name and functionality (except for small variations in a few exceptions).

Excel's formulas are oriented to calculations based on cells and ranges of these, for example, in cell A21, we can write the following expression:

=SUM(C1:G1)

We also have a new syntax, incorporated in the latest versions of Excel, that applies in the case that we have formatted a set of cells as a table, which allows us to reference columns in the table using the format "[@ColumnName]." Note that in tables, you can use both the traditional cell-based syntax and the new column-based syntax.

Why are we talking about these two syntaxes here? So to facilitate the introduction to the DAX syntax, which is more similar to the syntax we just saw, based on tables, although with some peculiarities:

Calculated fields are the new name given to what used to be called Measurements as of the 2012 version. It is simply a name change; the concept, functionality, and features remain the same. Here we will use the new name and in parentheses the old one, as you should know both, and you will find a lot of documentation on the internet using the term "Measures."

In DAX: Calculated fields = Measures

Where are the DAX formulas written? Within the Power Pivot window, although it is located in a different place and with certain nuances in the syntax, depending on whether it is a calculated column or a calculated field (measurement).

It is similar to sets of cells formatted as tables:

= "Expression."

= FUNCTION ("Expression1" ; "Expression2" ; ... )

To refer in any expression to the columns of a table, we will use the format 'Table' [Column], and follow the following rules:

The input of the calculated column expressions is done in a text box that is right at the top of the table, very similar to where you enter formulas in Excel. Once the expression has been entered, to change the name of the calculated column we right-click on it and choose the option "Change name of column" (this is the way to do it, since the name of the calculated column is not part of the DAX expression).

When we are writing an expression for a calculated column, we always get a contextual help that makes the task easier.

Calculated field := "Expression."

Calculated field := FUNCTION("Expression1" ; "Expression2" ; ... )

The same rules as above are used for the calculated columns, except that in this case, the name of the calculated field (measurement) does form part of the DAX expression and is indicated on the left side of the " := " symbols.

The process of entering calculated fields consists of positioning yourself in any free cell of the area of the calculated fields (the grid at the bottom of the table), clicking on the DAX expression area or the "F2" key, writing the DAX expression and pressing the "Enter" key.

Once a calculated column or a calculated field (measurement) is created, it is available to be used in any DAX expression we create. In fact, we recommend that whenever possible, you reuse them and avoid rewriting that part of the expression.

Whether you are using Excel in English or another language, only the original English function names will be valid. For example, we will always write SUM(... AVERAGE(..., etc. Translations such as SUM(..., AVERAGE(..., etc.) will not be valid.

A Calculated Column is a column that is added to an existing Power Pivot table, using a DAX expression that defines it. The value of the expression used is calculated for each row of the table at the same time it is created (at model design time) and then updated when the source data is updated (by pressing the " Refresh " button).

You can create calculated columns based on other calculated fields (measurements) and other calculated columns.

The calculated columns are added to the right of the columns we have extracted from the data source and automatically assigned the names CalculatedColumn1, CalculatedColumn2, and so on. Although you can rearrange the order of the columns by clicking on them and dragging them to the desired position, you can also rename any calculated column by right-clicking on it and choosing the option "Rename column."

Its values are stored on disk and in memory in the same way as any

other column in the table. Technically they are known as "materialized" or "persistent." They are designed to be static.

The calculated columns can be used later in Excel:

Let's look at some examples of use:

It's one more technique to denormalize

When we drag a column into the value area, something happens internally that we must understand, really what Power Pivot does is to create an "Implicit Calculated Field," this happens whenever we drag a column of a table, whether it is calculated or not, to the value area.

An Implicit Calculated Field is one that does not originally exist but is automatically created when you drag a column into the value area. From then on it exists as one more field which by default is assigned the aggregation function SUM and its default name is "sum of

Calculated_Column_Name". For example, if you have the SalesAmount column and drag it into the value area for the first time, the calculated field "Sum of SalesAmount" will be created automatically.

An Explicit Calculated Field is one that we create in design time in Power Pivot in the grid designed for it. Next, we will study in detail the calculated fields (measurements).

At this point, we ask ourselves the following question that has several answers, with its pros and cons: Where should I perform the calculations, with SQL when asking for information from the source, with Power Query, with calculated columns?

With SQL, we have two disadvantages, we need to know SQL language (it is evident), and in addition, the information is calculated in origin consuming resources of the server and traveling by the network once calculated. We do not recommend it in this case.

With Power Query, it is calculated during the execution of the data update process, it is done already at the destination, in the computer where it is running, and the column is stored like any other column in the source (we are not really able to distinguish when we are looking at the column information in Power Pivot if a column exists in the source or is calculated with Power Query. This is a good alternative, leaving all that calculation logic in Power Query, it has the advantage that if I need certain columns only for the calculation I can use them in the process and they don't have to get to the model in Power Pivot.

Using calculated columns, the DAX calculations are quite efficient, but on the other hand, it forces me to bring the columns involved in the calculation into the model and thus take up unnecessary space. What we can do is hide them from the user so that he does not see them.

Calculated Fields, also known as Measures in previous versions, are the calculations used for information analysis.

They are created when you write a DAX Provincement after you select a cell in the calculation area (this process has been seen in detail above).

Their values are not stored on disk (as is the case with the calculated columns) but are calculated by the engine when it receives the queries. Thanks to them, we will have almost infinite computing power. No, I'm not exaggerating. To make you aware from the beginning of the importance and scope of this incorporation, I am going to quote one of the greatest gurus in the field, Bill Jelen (Mr. Excel) :

"The Calculated Fields (Measurements) is the best thing that has happened to Excel in the last 20 years."

We may find that formulas are added and can be used in a dynamic table or graph, and so they are, but what is really relevant is that they offer unprecedented calculation power and flexibility.

If you have already used the pivot tables and you know the formulas that can be created in them, the new message is: "you will stop using them," it is no longer necessary, you have DAX that does everything those formulas and many other things. If you have not used this feature, it is not a problem, it is not well known, and you will not need it from now on either.

We consider it a fundamental question to know when to write a calculated column, and when a calculated (measured) field should be written. Although there is no single answer, we will give you a series of guidelines to help you in your decision. We will show you what to use in the following cases:

If for each row of the table, we have the number of units sold and the total amount of the row, but not the net unit price, and we want to calculate it.

Another common example is when we have a date, and we want to extract from its new columns with the year, month, quarter, day of the week, etc.

As we have seen above, calculated fields (measurements) can only be used in the value area (there is no way to place them in another area), while calculated columns can be used in addition to the value area, in segmenters ( slicers), filter area, rows or columns. We don't always have this need, and calculated columns have less flexibility than calculated fields, so think carefully when making your decision whether you really need to use it beyond the value area.

Calculated fields (measurements) are especially useful and flexible for performing calculations that affect sets of rows. For example, knowing the active customers (those we have sold to them over some time)

The next step is to get to know some fundamental concepts and the functions that make up the DAX language, to practice with them, and to show examples of use.

We are going to start with the study of some of the basic and most common DAX functions, and then go into more detail and additional concepts that we need to be clear about to obtain the best analytical result.

Let's start with the legacy Excel functions . If you see the name of a function that you already know from Excel, you can use it directly, since, in Power Pivot, it will have the same behavior and parameters that you have in traditional sheets. There will only be very few exceptions that we will see when the time comes to study each function in depth. Another difference, already mentioned above, is that there will be no translation but only the function with its English name.

The following image shows a list of these functions:

Legacy Excel functions

There is another group of functions, the Aggregation Functions, which come from the relationship of functions inherited from Excel and will make it easier for us to begin to understand the differences between traditional Excel and Power Pivot.

Aggregations allow you to contract, summarize, or group data. They are in charge of calculating a column from the maximum level of detail of its rows, affecting in principle all of them unless we indicate some kind of filter or descriptive breakdown.

Let's see an example, and if we want to know the total sales of the company, we will create a column calculated using the SUM() function.

So far, nothing new, we have already seen this function in previous examples, and it gives us the sum of all the values in the column "[SalesAmount]."

Let's now see what we have achieved by writing such a simple function for visualization and use in a pivot table. Note that we have also applied to format so that it always shows two decimals and the thousands of separators (how to do this was studied earlier).

If we create a pivot table connected to our model in Power Pivot and drag "TotalSales" to the value area, it shows us the result of adding "[SalesAmount]" for all the rows of the table and already formatted.

Now we break it down into a row for each store, and we break it down again with a column for each year.

And finally, we added a filter to have only the sales of "single" customers

(Marital Status=Single):

How many DAX formulas have you had to perform to obtain all this information regarding the company's sales amount?

JUST ONE!!! And keeping the same value in the

But, if you had used the SUM function in traditional Excel, how many times would you have had to change the cell ranges of the parameter that passes to that function to get that result? It's not even worth counting them, obviously one for each change.

That is where one of the elements that give that computing power and flexibility to perform analysis resides. But this is not the only element; later on, we will go deeper and understand why this behavior is so simple after creating a calculated (measured) field in DAX.

We will now explain the different types of aggregations that exist and the functions in each of them:

The following table shows the syntax of these functions and a brief description. For more details, you can see the product's help.

SUM(<Column>)

Add up all the rows in a column.

= SUM([SalesAmount])

COUNT(<Column>)

Count the number of rows in a column that has numerical data or dates.

= COUNT([SalesAmount])

COUNTROWS(<Table>)

Count the number of rows in a table.

= COUNTROWS('Client')

 

= COUNTROWS( RELATEDTABLE(

Sales' ))

AVERAGE(<Column>)

It returns the average of all the numbers in a column.

= AVERAGE([SalesAmount])

DISTINCTCOUNT(<Column>)

Count the number of rows with different values in a column.

=

DISTINCTCOUNT(Sales [Ticket])

MIN(<Column>)

Returns the smallest numerical value in a column that has numerical data or dates.

= MIN(Sales [StoreSK])

MAX(<Column>)

Returns the largest numerical value in a column that has numerical data or dates.

= MAX(Sales [StoreSK])

These functions allow us to navigate and obtain column values from various tables in the model as long as there are relationships between them.

The following table shows the syntax of these functions and a brief description. For more details, you can see the product's help.

RELATED(<Column>)

It returns a related value from another table column following the M->1 relationship.

Example: add a column calculated with the "[Population]" to the "Store" table (Many Stores -> 1 Population):

=RELATED(Geography(Population))

RELATED(<Table>)

It returns a table in a context specified by the filters indicated, following the relation 1->M. Example: add a column with the number of customers to the table "Geography":

= COUNTROWS(RELATEDTABLE('Client'))

 

The RELATED function can be used, for example, to un-normalized, creating calculated columns that allow columns from several related tables of the origin to be added to a single table.

We will come back to the RELATED function later, as it is used in conjunction with other functions, and we need to understand what contexts are and how they work.

 

Назад: Chapter Two: Introduction to Power Pivot
Дальше: Chapter Four: DAX in Practice