I n the last chapter on the development of an information system based on the Population Register, we explained the steps to follow to create the data table of an analysis model using the Power Query tools (language M) integrated with Power BI. The next stage in the development of this model is to create the lookup tables, a task for which we will also use the Query Editor. The source Excel file described in the previous delivery will be this time the primary source of data for the new tables that we will study in this chapter.
Once these search tables have been created, we will have all the elements to build the analytical model: relationships, measures, reports, dashboards, etc. through the designers of Power BI, this aspect will be addressed in an upcoming chapter.
We will start by importing the data located on the NACI sheet of the source Excel file, which corresponds to the nationality of the individual. From the main Power BI window, in the Home menu tab, External data group, Get Data | Excel, we will select the source Excel file.
Next, the Navigator window will offer us the possibility of choosing one or several sheets of this file. After selecting NACI, we will click on Edit, thus passing the data of the sheet as a new query to the Queries pane of the Query Editor window.
After changing the name of the query for Nationality in the Queries panel, we have to replace the current column titles (Column1 and Column2) with names that adequately describe their content.
One possibility is to reuse existing names in the Excel sheet from which we have imported this data. These names are currently in the third row of the table, so we will remove the first two rows (which do not contain valid information) using the menu option Remove Rows | Remove Top Rows, from the Reduce Rows group, Home tab, which will show a dialog box in which we will enter the number of rows to delete.
As a next step, we will transfer the values of the first row of the table to the column titles, using the Use First Row as Headers option, of the Transform group, Home menu tab.
We have explained this technique of assigning names to the headings of a table simply for demonstration purposes so that the reader knows the existence of this possibility. However, we will not reuse the column titles of the source Excel file, but we will assign new names explicitly; so in this table, we will change the names of the columns by NationalityID (Text) and Nationality (Text), by right-clicking on each title and selecting the Rename option.
This table also contains a significant amount of empty records at the end, which we will delete using the menu option. Remove Rows | Remove Blank Rows, from the Reduce Rows group, Home tab.
In the steps carried out to build the PopulationCensus table, we saw that the values used to identify the sex of the individual are 1 for men and 6 for women, so the base query for the search table of these data, being very simple, we will create it dynamically through an expression in M language.
First, we will select the menu option New Source | Blank Query, from the New Query group, Home tab, to create an empty query in which we will write the following block of code.
= Table.FromRecords (
{
[SexID = 1, Sex = "Man"],
[SexID = 6, Sex = "Woman"]
}
)
Let us analyze each of the parts of the previous expression: first, to define a record in M, we must enclose in square brackets the field names that compose it next to their values, using the format shown in the following figure.
Next, the records that will be part of the table will be included, separated by commas, in a list (List object). The key characters are those used in M to define a list of elements.
Finally, we will use the FromRecords function of the Table category, to which we will pass as a parameter the list we just created, obtaining a table as a result.
Once the table is created, we will change its default name to Sex; for the SexID and Sex columns, we will assign as data type Whole Number and Text, respectively.
During the creation stage from the data table, we explained that the residence of the individual is established from the combination of the province and municipality codes; These values are found in the CPRO and CMUN sheets of the Excel source, so to assemble the table of the place of residence we will first have to import these sheets into Power Query, just as we did before with the nationality table, changing the name of the CPRO table by ProvinceResidence, and the columns CPRO and Province of Residence by ProvinceResidenceID (Text) and ProvinceResidence (Text). Regarding the CMUN table, we will do the same name change operation for MunicipalityResidence, and the columns CPRO, CMUN, and NAME for ProvinceResidenceID (Text), MunicipalityResidenceID (Text) and MunicipalityResidence (Text).
Next, in the Queries panel, we will click on ProvinceResidence, and we will delete the first three rows with the menu option Remove Rows | Remove Top Rows. For the MunicipalityResidence query, we will delete the first five rows.
Reviewing the contents of these tables, we will appreciate that in MunicipalityResidence we have almost everything necessary; at the moment a column with the name of the province is missing, an issue that we will solve by crossing with ProvinceResidence, to transfer the data of the column of the same name.
The first requirement for a crossing of these characteristics is to be positioned in the table to which we are going to add data, so in the Queries panel we will click on MunicipalityResidence, and then we will select the Merge Queries menu option, from the Combine group, on the Home tab.
Then the Merge window will open, which in its upper part shows the origin table of the crossing: MunicipalityResidence. We will also have to select the ProvinceResidence table at the bottom, and in both the crossing and union field, which has the same name in the two tables: ProvinceResidenceID. The type of crossing or Join Kind, which in this case is Left Outer, indicates that for the operation, all the rows of the first table, MunicipalityResidence, and those rows of the second, ProvinceResidence, in which there is a match will be used. The last element of this window is an informational message with the number of rows between which there has been a match, and which in this example has been for all rows.
The consequence of this union operation when accepting the Merge window will be the aggregation, in the MunicipalityResidence table, of an additional column with the name NewColumn, containing the value Table in its cells. If we click on an empty area of any of these cells, the record obtained from the ProvinceResidence table, the result of the crossing operation, will be displayed at the bottom of the data area.
However, we have not yet transferred any data between the tables. Now that we have established a relationship between them thanks to the Merge Queries option, we must select what data from the ProvinceResidence table we want to take to MunicipalityResidence; what we will achieve by clicking on the icon next to the header of the NewColumn column, selecting the Expand option (default) and checking only the box in the ProvinceResidence column.
When you click OK, the selected action (Expand) expands the ProvinceResidence table to which each cell in the NewColumn column points, and as a result, brings the MunicipalityResidence a column with the name NewColumn.ProvinceResidence containing the names of the provinces.
After changing the name of the new column to ProvinceResidence, we will move its position until it is located next to ProvinceResidenceID. We will use the Move | Left, from the Any Column group, on the Transform menu tab. We will have to execute this option a couple of times until the column is placed in the desired position.
Place of residence. Creation of the code for bleached municipalities and union of consultations
When generating the code of the place of residence in the PopulationCensus table, we saw that for the municipalities with less than 10,000 inhabitants, the municipality code is not provided due to data protection issues. This had led us to decide to use the code 999 for such cases, which means that, for example, for province 08 (Barcelona), the code of the place of residence will be 08999.
The MunicipalityResidence table that we have just created would currently lack all this set of records that we do know about the province, but we do not know the municipality, and that is necessary for the measures that we are going to implement in the modeling phase to correctly perform their calculations. Then we explain a technique that will help us solve this problem.
In the ProvinceResidence table, we will add two new columns: MunicipalityResidenceID (Text) and MunicipalityResidence (Text), containing respectively the code 999 and the literal Not available.
Next, we will position ourselves in the MunicipalityResidence table and select the Append Queries menu option, from the Combine group, Home tab. In the Append window, we will select the ProvinceResidence table, and when you click OK, the records of the latter will be added to MunicipalityResidence, completing the code combinations necessary for this table.
We will finish the creation of this table by adding a new column that represents its identifier, to which we will give the name PlaceResidenceID (Text), composed of the union of the columns ProvinceResidenceID and MunicipalityResidenceID.
Once created, we will place this column in the first position of the table using the menu option Move | To Beginning, from the Any Column group, on the Transform tab.
The table with the age of the individuals will contain the values in the range of 0 to 99 years; for those with 100 years or more, the value 999 will be used for data protection reasons.
The M language allows us to generate a sequence of numbers using the Numbers function of the List category, passing as parameters the initial value and the number of numbers to be created.
= List.Numbers (0,100)
However, this is not enough, since we also have to add the number 999 to these values. To solve this problem, we will use the Combine function, also of the List category, which receives as a parameter an object of type List containing a combination of lists this function will return as a single list.
In this way, in addition to the list with the range 0-99, we will create another composed only by the number 999, which we will also pass to combine.
= List.Combine (
{
List.Numbers (0,100),
{999}
}
)
The previous block of code will be included in a new empty query (New Source | Blank Query menu option, from the New Query group, Home tab), which will create a unique list by adding it to the Queries panel.
In the next step, we will convert the list into a table with the To Table menu option, from the Convert group, located on the List Tools | Transform.
In the To Table window that opens next, we will accept the default values. Once the table is created, we will change its name to Age and the name of the column to AgeID (Whole Number). Next, we will add a new column with the name Age (Text) from the values of AgeID, but converted to character, using the following expression.
Number.text ([AgeID])
An aspect widely used in work with demographic data consists of analyzing the population's age in ten-year, five-year, or other range intervals. For this example we will define the following intervals: 0, 1-4, 5-14, 15-24, 25-34, 35-44, 45-54, 55-64, 65-74, 75-84, 85 and more years.
The way to implement these intervals in the age table will be through the incorporation of a new column, which we will call Decennial Age (Text), in which we will include the following block of code, where based on the interval in which the age is placed, We will assign the corresponding literal.
if [AgeID] = 0 then
"0"
else
if [AgeID]> = 1 and [AgeID] <= 4 then
"1-4"
else
if [AgeID]> = 5 and [AgeID] <= 14 then
"5-14"
else
if [AgeID]> = 15 and [AgeID] <= 24 then
"15-24"
else
if [AgeID]> = 25 and [AgeID] <= 34 then
"25-34"
else
if [AgeID]> = 35 and [AgeID] <= 44 then
"35-44"
else
if [AgeID]> = 45 and [AgeID] <= 54 then
"45-54"
else
if [AgeID]> = 55 and [AgeID] <= 64 then
"55-64"
else
if [AgeID]> = 65 and [AgeID] <= 74 then
"65-74"
else
if [AgeID]> = 75 and [AgeID] <= 84 then
"75-84"
else
if [AgeID]> = 85 then
"85+"
else
""
Our next operation will be to create the table with which we can classify individuals according to the size by inhabitants of the municipality in which they reside.
To do this, we will import the TAMU sheet from the source Excel file. Once the data has been incorporated into Power Query, we will remove the last column of the table, which is empty, using the menu option Remove Columns, from the Manage Columns group, Home tab.
Next, we will change the name of the table to TamMunicipalityResidence, and the column names to TamMunicipalityResidenceID (Text) and TamMunicipalityResidenceOrigin (Text), eliminating the first three and last four rows using the Remove Top Rows and Remove Bottom Rows suboptions respectively, depending on the Remove Rows menu option.
Since the objective of this table will be to inform whether the municipality is Provincial capital, and otherwise, the range of inhabitants it has, the descriptors obtained from the Excel file may be excessively long and repetitive. For this reason, we will try to summarize them, so that we simply show the literal “Province Capital” when it is a Provincial capital city or the range of inhabitants in the rest of the cases.
We will create a new column with the name TamMunicipleResidence (Text) using the following block of code.
if Text.Contains ([TamMunicipleResidenceOrigen], "province") then
"Province capital."
else
if Text.Contains ([TamMunicipalResidenceOrigin], "hasta") then
"Up to 100."
else
if Text.Contains ([TamMunicipalResidenceOrigin], "de") then
Text.RemoveRange ([TamMunicipleResidenceOrigin], 0,24)
else
Text.RemoveRange ([TamMunicipleResidenceOrigin], 0.21)
The technique used in the previous code is to check the value of the TamMunicipalityResidenceOrigin column using the Text.It contains a function, which returns true or false, depending on whether or not, within the text passed as the first parameter, the character string of the second parameter. In cases where the word "province" exists, we will know that it is a Provincial capital city and we will return the literal "Province Capital"; If it contains the word "up to" it will be a municipality with a maximum of 100 inhabitants, and we will return "Up to 100".
For the rest of the occurrences, we are only interested in taking the text fragment that indicates the range of inhabitants, eliminating the rest with the Text.RemoveRange function, which is responsible for eliminating, in the text of the first parameter, a substring whose start index we indicate in the second parameter, while its length is specified in the third parameter. In these values, the amount of text to be deleted will depend on the existence of the "de" particle in the value of the column.
Regarding the size by inhabitants of the municipality of birth (sheet ALSO of the Excel source file), the creation of your query is equal to the process just described, so we will avoid repeating it here.
As we had explained in the previous chapter, the table with the information of the place of birth that we will build next will have a structure that will allow us, during the design of the data model, to create a hierarchy based on the levels of country, province, and municipality.
First, we will import the CPRON and CMUNN sheets from the source Excel file, which in the Queries panel, we will change their name to Province of Birth and Municipality of Country of Birth, respectively.
Placing ourselves in the Province of Birth table, we will rename its columns by Province Birth Birth (Text) and Birth Province (Text); then, we will delete the first three rows.
Then we will position ourselves in the MunicipalityCountryBirth table, where we will remove its empty columns, and then, from the Queries panel, we will right-click on this same table, selecting the Duplicate option. We will change the name to the new duplicate table by Country.
We will deal with this table a little later, now we will return to MunicipalityCountryBirth, and first, we will delete the last three columns, corresponding to the country data, using the Remove Columns menu option
Continuing with the delete operations, then we will use the menu sub-options dependent on Remove Rows to delete the first eight rows, as well as those rows with null values. We will also change the default names of the columns to ProvinceBirthID (Text), MunicipalityBirthID (Text), and MunicipalityBirth (Text).
As it happened during the preparation of the table of the municipality of residence, the MunicipalityCountryBirth table has the province code but not the names, so we will make a combination (Merge Queries menu option) with the ProvinceBirth table, to transfer from this Last column with the province names to the table of municipalities. We will not repeat the steps on how to make a combination of tables since this operation was described earlier during the preparation of the tables relating to the place of residence. Once the crossing is finished, and the positions of the columns are organized.
As indicated in the source Excel file, information on births in municipalities with less than 10,000 inhabitants is bleached for data protection, which means that for these cases, we will know the province of birth but not the municipality. The technique that we will use to incorporate these data in the MunicipalityCountryBirth table will be the same as that previously used for the MunicipalityResidence table.
First, we will add two new columns to the ProvinceBirth table: MunicipalityBirthID (Text) and MunicipalityBirth (Text), containing respectively the values 999 and Not available.
Then, using the Append Queries menu option, we will add to the MunicipalityCountry table the content of the ProvinceBirth table. As a result, MunicipalityCountryBirth will now contain all possible combinations of province and municipality, including those of bleached municipalities (not available) because they have a population of fewer than 10,000 inhabitants.
The next step will be to create two new columns: CountryBirthID (Text) and CountryBirth (Text), to host the country data. Currently, the table only has information on provinces and municipalities in the USA, so the values of the two new columns will be country code 108 (see the reader for the Nationality table) and the literal USA, respectively. After rearranging the columns in a more logical order (country - province - municipality).
Now it is necessary to add the data of the rest of the countries, so we will place ourselves in the CountryBirth table to prepare its structure and contents so that we can combine it with MunicipalityCountryBirth.
The columns of the CountryBirth table that we need to work with are the last 3, which correspond to the province code 66, indicative of being born abroad, as well as the country code and name.
As an alternative way to delete columns, we will select the columns that contain the data we need, and we will execute the Remove Columns | Remove Other Columns from the Manage Columns group, Home menu tab; thus deleting the columns of the table that we had not selected.
This table contains a large number of empty records (null, blank, etc.) that we will delete with the Remove Rows | Remove Blank Rows, used on other occasions throughout for similar situations. After this deletion, we will also delete the first two rows, to keep only the data of the countries.
The next step will be to change the name of the columns by Province Birth Birth (Text), Country Birth ID (Text), and Country Birth (Text).
Now we must match the structure of this table with the MunicipalityCountryBirth table, adding the columns ProvinceBirth (Text), MunicipalityBirthID (Text), and MunicipalityBirth (Text). Since the CountryBirth table contains only the data for the population born abroad, we do not have, of course, data on province or municipality, so we will assign values to these new columns according to this situation: 999 for NacimientoID Municipality and the literal Not available for ProvinceBirth and MunicipalityBirth.
Next, we will add the records of this table to MunicipalityCountryBirth through the menu option Append Queries, obtaining. As a result of all combinations of codes that will allow us to locate the population geographically.
After adding this data, we need to remove from the MunicipalityCountryBirth table the row that contains the foreign value in the ProvinceBirth field, since it corresponds to a non-existent case in the PopulationCensus data table: individual born in the country but classified as a foreigner by province.
The operation that we are going to carry out to achieve this objective will not consist, however, in the deletion of a row, but we will apply a filter on the table, which excludes the record in question; for what we will click on the filtering icon in the Birth Province column, selecting the Text Filters | Does Not Equal. In the Filter Rows window shown below, we will enter the foreign value and accept; this will apply the filter, excluding said row from the rest of the rows of the MunicipalityCityBirth table.
As the last task, we will add a column that will act as the identifier of the table, which we will call Birth PlaceID (Text). This column will be formed by the concatenation of the CountryBirthID, ProvinceBirthID, and MunicipalityBirthID columns; Once created, we will place it as the first column of the table.
[CountryBirthID] & [ProvinceBirthID] & [MunicipalityBirthID]
At this point, we can conclude the process of creating the tables that will be part of the data model. However, we will not need to use in the modeling process all the tables that the Query Editor currently contains, so we will disable their loading to achieve a cleaner design of the model and optimize the processing of your data.
Located in the Queries panel, to disable the loading of a table, we will right-click on it, unchecking the Enable load option box. The tables on which we will carry out this operation are ProvinceResidence, ProvinceBirth, and CountryBirth.
On the other hand, we will re-enable data loading for the PopulationCensus table, since we had previously disabled it during the preparation phase of the model search tables, also for performance reasons.
In this and the previous chapter, we have addressed the creation of the search tables or lookup tables of an analysis system built using Power Query ETL tools integrated into Power BI. In the next chapter, we will discuss those aspects of the information system that concern the design of the data model: relationships between tables, creation of hierarchies, measures, etc., as a previous step to the visual representation of the information contained in the system.