Project

Mastering DAX Table Functions: A Practical Approach

A comprehensive study project to master DAX table functions on Power BI. The project aims to advance the user from intermediate to power-user level focusing on creating and utilizing virtual tables.

Empty image or helper icon

Mastering DAX Table Functions: A Practical Approach

Description

This project is tailored for Power BI users who have an intermediate understanding of DAX and aim to be advanced users. We will shine a light on all key table functions, allowing the user to create and manipulate virtual tables with ease and efficiency. Through numerous examples, the user will be provided with real-world context that underscores the practicality and applicability of these functions. This progressive learning approach will not only increase knowledge depth but also improve the practical skill-set.

Unit 1: Introduction to Virtual Tables in DAX

Data Analysis Expressions (DAX) is a library of functions and operators that can be combined to build formulas and expressions in Power BI, Analysis Services, and Power Pivot in Excel. Virtual Tables in DAX are used as the source table for many DAX functions and play a crucial role in many analysis scenarios.

Setup

Ensure that you have the following software installed on your machine:

  • Power BI Desktop

You can download Power BI Desktop from the official Microsoft link - PowerBI Desktop

After installing Power BI Desktop, launch the application.

Section 1: Loading the Data

For our demonstration, we will be using a simple dataset in CSV format that contains sales data. Load the CSV file into Power BI by following the steps:

  1. Select Get Data on the top bar of Power BI Desktop, click on the Text/CSV option.

  2. Navigate to the location where you have stored your dataset and select your CSV file.

  3. Click load which roughly transcribes the CSV file into a table that DAX can utilize.

Section 2: Creating Calculated Tables

After data loading, the next step will be creating calculated tables. Perceive a calculated table as a type of virtual table that you define in a tabular model's calculation area. Here are the steps:

  1. Click on Modeling in the top menu, then New Table.

  2. We are going to create a virtual table that consists of unique products sold. Type the following:

UniqueProduct = DISTINCT('SalesData'[Product])

Above, SalesData is name of the table loaded in the model and Product is the column in that table.

Section 3: Utilizing Virtual Tables in DAX functions

Finally, let's use the SUMMARIZE function which is often used in combination with other DAX functions and serves to summarize a table/timeline of data. In our case, let's compute the total quantity sold per product. Type this into the new table under the Modeling tab:

ProductQuantity = 
   SUMMARIZE (
     SalesData,
     SalesData[Product],
     "Total_Quantity", SUM(SalesData[Quantity])
   )

The SUMMARIZE function above creates a new table with unique Products(<SalesData>[Product]) and sum of quantity(SUM(SalesData[Quantity])) for each product.

Now, you have a basic understanding of how to create and utilize virtual tables in DAX. This is, of course, a simple demonstration and real-world scenarios may require more complex approaches and functions.

Remember, practice is the key to mastering DAX. Continue experimenting with different functions and applying them to your reporting needs!

Unit 2: Handling Relationships with RELATED Function

The RELATED function is a powerful feature in DAX (Data Analysis Expressions) used in Power BI for fetching data from a different (related) table using the relationship between tables.

For demonstration purposes, let's pretend we have two tables: Sales and Products with established relationships.

Sales Table

SaleID ProductID Quantity
1 101 5
2 102 2
3 103 1
4 101 3
5 104 4

Products Table

ProductID ProductName ProductPrice
101 Product A 10
102 Product B 15
103 Product C 20
104 Product D 25

We'll use the RELATED function to pull the ProductName and ProductPrice from the Products table to the Sales table based on the ProductID.

Implementing the RELATED function

To implement a RELATED function, follow these steps:

  1. Create a calculated column

Add a new column to the Sales table that will store the product names fetched from the Products table.

Sales[ProductName] = RELATED(Products[ProductName])

We'll do the same for the Product Price:

Sales[ProductPrice] = RELATED(Products[ProductPrice])

Now our Sales table looks like:

Sales Table

SaleID ProductID Quantity ProductName ProductPrice
1 101 5 Product A 10
2 102 2 Product B 15
3 103 1 Product C 20
4 101 3 Product A 10
5 104 4 Product D 25
  1. Calculate total sales per product

Now that we have the Sales table enriched with product details, we can calculate the total sales for each product.

Sales[TotalSale] = Sales[Quantity] * Sales[ProductPrice]

and our Sales table now is:

Sales Table

SaleID ProductID Quantity ProductName ProductPrice TotalSale
1 101 5 Product A 10 50
2 102 2 Product B 15 30
3 103 1 Product C 20 20
4 101 3 Product A 10 30
5 104 4 Product D 25 100

Congratulations, you have just used the RELATED function to enrich one table with details from a related table and then used that additional information to perform calculations. This concept applies not only to Power BI but also to any kind of relational data model.

Implementing Aggregations using SUMMARIZE and ADDCOLUMNS Functions

To practically demonstrate the applications of the SUMMARIZE and ADDCOLUMNS functions for data aggregations, this guide assumes the presence of a sample dataset named 'Sales' in the Power BI environment. This dataset has the columns 'Product', 'Region', 'SalesAmount', 'Cost' and 'UnitsSold' for simplicity. Different scenarios of applying SUMMARIZE and ADDCOLUMNS functions will be demonstrated.

To start with, ensure that your Power BI environment is set up properly and the Sales dataset has been loaded correctly.

Section I: Using SUMMARIZE Function

The SUMMARY function is used for grouping data and performing an operation such as count, sum, average, minimum, or maximum on the grouped data.

Sales_Sum = SUMMARIZE( 
    Sales, 
    Sales[Product], 
    Sales[Region],
    "Total Sales", SUM(Sales[SalesAmount]), 
    "Total Cost",  SUM(Sales[Cost]),
    "Total Units", SUM(Sales[UnitsSold])
)

In the given formula, SUMMARIZE function is creating a summary of the 'Sales' dataset by grouping based on 'Product' and 'Region', and then calculating the sum of 'SalesAmount', 'Cost', and 'UnitsSold' for each group.

Section II: Using ADDCOLUMNS Function

The ADDCOLUMNS function is used to add calculated columns to the given table in Power BI.

SalesDetails = ADDCOLUMNS( 
    Sales, 
    "Profit", Sales[SalesAmount] - Sales[Cost],
    "ProfitMargin", DIVIDE((Sales[SalesAmount] - Sales[Cost]), Sales[SalesAmount])
)

In the provided formula, ADDCOLUMNS function is adding two new columns – 'Profit' and 'ProfitMargin' to the 'Sales' table. 'Profit' is calculated by subtracting 'Cost' from 'SalesAmount' whereas 'ProfitMargin' is calculated by dividing 'Profit' by 'SalesAmount'.

Section III: Combining SUMMARIZE and ADDCOLUMNS Functions

Both SUMMARY and ADDCOLUMNS can be combined to provide a more thorough data analysis.

CombinedSalesDetails = ADDCOLUMNS( 
    SUMMARIZE( 
        Sales, 
        Sales[Product], 
        Sales[Region],
        "Total Units", SUM(Sales[UnitsSold])
    ),
    "Total Sales", [Total Units] * AVERAGE(Sales[SalesAmount]),
    "Total Cost",  [Total Units] * AVERAGE(Sales[Cost]),
    "Profit",      [Total Sales] - [Total Cost],
    "ProfitMargin",DIVIDE(([Total Sales] - [Total Cost]), [Total Sales])
)

In this example, SUMMARIZE function is creating a summary of the 'Sales' dataset by grouping the data based on 'Product', and 'Region', and then calculating sum of 'UnitsSold' for each group. Subsequently, ADDCOLUMNS function is adding calculated columns for 'Total Sales', 'Total Cost', 'Profit', and 'ProfitMargin'.


Note: The syntax and column names used might differ based on the actual dataset and requirements. The examples provided are generic. Please replace with appropriate dataset/column/field names as necessary.

Please try the provided DAX scripts in your Power BI environment using your 'Sales' dataset.

Unit 4: Developing Filters with CALCULATETABLE and FILTER

In this section, we will cover the practical implementation of the CALCULATETABLE and FILTER functions to create a dynamic filter. These functions are an important part of Data Analysis Expressions (DAX) and serve a broad range of functionalities in Power BI.

Creating Filters with CALCULATETABLE

First, let's review the CALCULATETABLE function syntax:

CALCULATETABLE(,,,...)

CALCULATETABLE function receives a table as input and returns an altered version of that table according to the filters applied.

Let's say that you have an existing Sales table with a number of columns, and you want to create a new table that only contains this year's sales data.

SalesThisYear = CALCULATETABLE(
    Sales, 
    YEAR(Sales[OrderDate]) = YEAR(TODAY())
)

Developing Filters with FILTER

Now, let's look into the FILTER function:

FILTER(
, )

The FILTER function evaluates <expression> for each row of <table> and only keeps the rows where <expression> is TRUE.

For example, from the same Sales table, we want to have a table that only includes recrods where product quantity is more than 500; we could implement the FILTER function like this:

HighQuantitySales = FILTER(
    Sales, 
    Sales[Quantity] > 500
)

Combining CALCULATETABLE and FILTER

For a more complex scenario, we will now combine CALCULATETABLE and FILTER functions.

In this scenario, we want to retrieve sales that were made in the summer months and where the sales quantity was more than 500. A summer month is one where the 'Month' value is between June and August (6 and 8).

Here's how you might do that:

SummerHighQuantitySales = CALCULATETABLE(
    Sales,
    FILTER(
        Sales,
        Sales[Quantity] > 500
    ),
    FILTER(
        ALL(Sales[Month]),
        Sales[Month] >= 6 &&
        Sales[Month] <= 8
    )
)

In this example, CALCULATETABLE uses two FILTER expressions to create a new table. The first FILTER expression returns a new table of sales where the quantity is greater than 500. The second FILTER expression further filters that table to only include sales from the summer months. The ALL function is used to clear any previously applied filters on the Sales[Month] column. When these two FILTER functions are used in combination with CALCULATETABLE, they operate on the result of previous filters, effectively creating an "AND" operation between the filters.

This concludes the practical application of CALCULATETABLE and FILTER DAX functions. Note that understanding and mastering these table functions is crucial as they form the basis of many complex DAX formulas and use-cases.

Unit 5: Advanced Techniques for JOIN Functions

In this unit, we will explore the advanced techniques for JOIN functions in DAX. Specifically, we will focus on two primary DAX JOIN functions - NATURALINNERJOIN and NATURALLEFTOUTERJOIN. These functions merge two tables into one, based on shared column names.

Prerequisites

Before we dive in, ensure you're already familiar with basic DAX functions, table functions, and table relationships. Also, it's assumed that you've completed the initial four units of this project.


Practical Implementation

Suppose we have two tables, Sales and Products. Let's first display these tables using EVALUATE.

// display Sales table
EVALUATE Sales

// display Products table
EVALUATE Products

1. NATURALINNERJOIN

The NATURALINNERJOIN function in DAX performs an inner join of two tables based on their shared column names. Only the rows containing matching values in the shared columns are returned. If a column name in table1 exists in table2, the join uses that column. The returned table includes all shared columns.

Let's join our Sales and Products tables on their shared columns.

EVALUATE
NATURALINNERJOIN(Sales, Products)

2. NATURALLEFTOUTERJOIN

The NATURALLEFTOUTERJOIN function in DAX performs a left outer join of two tables based on their shared column names. All rows from the left table and any rows in the right table that match up are returned. If a shared column name is in both tables, the join uses that column. The resulting table includes all shared columns.

In the following example, a NATURALLEFTOUTERJOIN operation is executed between Sales and Products.

EVALUATE
NATURALLEFTOUTERJOIN(Sales, Products)

Remember, in both functions, a maximum of two tables can be joined, and the joined tables should share at least one column name.

By mastering these JOIN functions, you can combine, filter, and aggregate data from different tables in your DAX data model. This allows for comprehensive analyses of the interactions between different sections of your data, providing you with a deeper understanding of your data's structure and relationships.

Hope this gives you a real-life implementation of advanced techniques for JOIN functions in DAX. Now you can easily apply these techniques in your DAX queries to manipulate data in a more advanced way.

Unit 6: Row Context and EARLIER Function in Depth

In Data Analysis Expressions (DAX), row context refers to the context where each row is evaluated individually in relation to other rows. DAX provides certain functions to manipulate, evaluate, or compare the values of the current row against other rows. One of these functions is the EARLIER function.

The EARLIER function is quite handy when you want to use a certain value of the current row within an aggregation function, such as SUM, AVERAGE, COUNT, etc. In other words, it allows you to access data from an "earlier" row context in the calculation chain.

Sample Dataset

The sample dataset Sales has following fields:

  • SalesID: Unique identifier for each sale transaction
  • Year: Year of the sale
  • ProductName: Name of the product sold
  • UnitsSold: Number of units sold
  • PricePerUnit: Price per unit of the product

EARLIER Function Usage

Let's try to calculate the cumulative sales for each product for each year.

The DAX formula for this would be:

Cumulative Sales =
CALCULATE(
  SUM(Sales[UnitsSold] * Sales[PricePerUnit]),
  FILTER(
    ALL(Sales),
    Sales[Year] <= EARLIER(Sales[Year]) && 
    Sales[ProductName] = EARLIER(Sales[ProductName])
   )
 )

Breakdown of the formula:

Here, CALCULATE function modifies the context in which the data is filtered. In this case, it's calculating the total sales (SUM(Sales[UnitsSold] * Sales[PricePerUnit]), but for a filtered subset of data.

Inside CALCULATE, the FILTER function generates a new table that includes all rows (because of ALL(Sales)) that meet the specified conditions.

The conditions we have are Sales[Year] <= EARLIER(Sales[Year]) and Sales[ProductName] = EARLIER(Sales[ProductName]), which basically compares the current row's 'Year' and 'ProductName' with all other rows in the table, gathering only those rows that have the same product name and a year less than or equal to the current row's year.

This way, for each row, we get cumulative sales by year for each product. EARLIER function here allows us to refer to the current row's values while running the aggregate calculations inside CALCULATE.

Now you can add [Cumulative Sales] to any report/table vizualization you have in Power BI, and it will display the cumulative sales by each year for each product.

Unit 7: ALL Function Variations and Their Uses

The ALL Function in DAX (Data Analysis Expressions) essentially removes any filters that might've been applied to the column or table being referenced. It's often used in combination with other functions to create complex expressions. In this unit, we'll explore the use of the ALL function - along with some of its variants (ALLSELECTED, ALLEXCEPT), and how these can be utilized in real-world scenarios.

1. ALL Function in DAX

Let’s consider an example where you're working with a sales dataset, and you want to calculate the total sales. Later, you want to compare individual product sales with the total sales.

Imagine a table - FACT_Sales:

Product_ID Sales_Amount
1 200
1 300
2 150
2 170
3 300
3 100
Total_Sales = SUM( FACT_Sales[Sales_Amount] )

Product_Total = CALCULATE (
    SUM ( FACT_Sales[Sales_Amount] ),
    ALL ( FACT_Sales[Product_ID] )
)

Total_Sales is straightforward, it’s just the sum of Sales_Amount. Product_Total uses CALCULATE to change the context of the calculation. It removes filters on Product_ID by using ALL, thus giving us overall total sales.

2. ALLSELECTED Function

ALLSELECTED returns all the rows in a table (or all the values in a column), while keeping intact the filters that have been applied to other columns.

Continuing the previous example, suppose you've a slicer connected to your sales dashboard, where you can pick the year. If you select 2020 and then calculate total sales using ALL, it will not respect your slicer selection. This problem is solved with ALLSELECTED.

Product_Total_AllSelected = CALCULATE (
    SUM ( FACT_Sales[Sales_Amount] ),
    ALLSELECTED ( FACT_Sales[Product_ID] )
)

This version respects filters on other columns, while removing filters on the Product_ID column.

3. ALLEXCEPT Function

ALLEXCEPT is a variation that keeps all filters on all other columns, except the ones specified. Suppose, to calculate the total sales for each product in a given year, you might do:

Product_Total_AllExcept = CALCULATE (
    SUM ( FACT_Sales[Sales_Amount] ),
    ALLEXCEPT ( FACT_Sales, FACT_Sales[Year] )
)

This variation will give us a total sales for each product and respect all filters except the ones on 'Year'.

In conclusion, the ALL function and its variants can be used effectively to control the context of your calculations in DAX according to your specific requirements.

Unit 8: Dealing with Blanks in DAX Tables

In DAX, dealing with blanks might be necessary especially when we are dealing with real world data, as there is often the chance that some values are missing in certain columns. These missing values are represented as blank in DAX. By understanding how to handle these blanks can aids in creating more robust and error-free reports.

A. Using IF to Replace Blanks

Often we may want to replace blank values with a placeholder or another value, and the IF function in DAX allows us to do this effectively.

Suppose we have a 'Sales' table, which contains a 'Product' and 'Price' column. For some reason, some values in the 'Price' column are left blank. We can replace these blank values with 0 using the IF function.

Sales := 
ADDCOLUMNS (
    'OriginalSales',
    "Price", IF ( ISBLANK ( 'OriginalSales'[Price] ), 0, 'OriginalSales'[Price] )
) 

Here, ISBLANK is used to query whether a cell is blank. If it is, the IF function replaces it with 0.

B. Excluding Blanks Using FILTER

We might want to exclude rows containing blanks from our analysis. We can do this by creating a new table that only contains rows where the 'Price' is not blank using FILTER.

FilteredSales :=
FILTER (
    'OriginalSales',
    NOT ( ISBLANK ( 'OriginalSales'[Price] ) )
)

Here, ISBLANK is again used to check whether a cell is blank, but negated with the NOT function, so the filter only includes rows where the 'Price' is not blank.

C. Handling Blanks in Measures

Sometimes you would like to display different value in measures when the result is blank. For instance, if we have a measure [Total Sales] that sums up the 'Price' column and we would like to display "N/A" when the result is blank, we can modify our measure as follows:

Total Sales :=
IF (
    ISBLANK ( SUM( 'Sales'[Price] ) ),
    "N/A",
    SUM( 'Sales'[Price] )
)

This defensive coding technique helps to improve the readability of your reports.

D. Counting Blanks with DAX

If you want a count of rows where the 'Price' value is blank you can do so using the COUNTBLANK function.

Blank Count :=
COUNTBLANK ( 'Sales'[Price] )

Conclusion

Blanks in your data can lead to incorrect calculations and misleading reports. These techniques shown here provide several ways you can handle blanks in DAX, but note that the best treatment always depends on the specific context and meaning of the data.

Unit 9: Practical Examples: Nested Functions and Complex Tables

Objective: To demonstrate the practical use of nested functions and complex table structures in DAX (Data Analysis Expressions)

Note: This unit assumes that you have a data model defined with necessary relationships needed for these examples.

I. Nested Functions in DAX

Nested functions are a concept in DAX where one function is used as an argument for another function. This functionality essentially allows a DAX function to use the result of another DAX function for computation. In order to illustrate the use of nested functions effectually, we will use a combination of DAX functions (such as SUM, CALCULATE, FILTER and ALL) in the examples below.

Example: Accumulative Sales per Category

This cumulative sales DAX measure uses nested SUM and CALCULATE functions together with a filter ALL.

Accumulative Sales per Category = 
CALCULATE(
    SUM(Sales[SalesAmount]),
    FILTER(
        ALL(Sales[OrderDate]),
        Sales[OrderDate] <= MAX(Sales[OrderDate])
    ),
    VALUES(Products[Category])
)
  • It first accumulates SalesAmount until the latest order date with SUM(Sales[SalesAmount]).
  • The FILTER function is then used to include all dates that are less than or equal to the latest order date.
  • VALUES(Products[Category]) makes sure the calculation is done per each category.

II. Complex Tables in DAX

Complex table structures entail nested tables, tables with multiple conditions and tables that use DAX functions to compute and filter data.

Example: Sales by Category and Year with Conditions and Nesting

This complex table DAX measure generates a summarized table presenting sales per category and year, but only including categories and years with sales above a given threshold.

Sales Summary = 
ADDCOLUMNS(
    SUMMARIZE(
        FILTER(
            ALL(Sales[OrderDate], Products[Category], Sales[SalesAmount]),
            [Total Sales by Category and Year] > Threshold_Value
        ), 
        Products[Category],
        "Year", YEAR(Sales[OrderDate])
    ),
    "Total Sales by Category and Year", [Total Sales by Category and Year]
)
  • The FILTER function is used to include all sales data but only where [Total Sales by Category and Year] > Threshold_Value.
  • SUMMARIZE then creates a summarized table based on the filtered sales data.
  • ADDCOLUMNS finally adds a column to the summarized table to display the total sales by category and year.

You can replace Threshold_Value with the desired sales threshold. The [Total Sales by Category and Year] measure needs to be defined in your DAX model.

Please remember to tailor the table and field names according to your data model for these formulas to work.

Unit 10: Case Study: Applying DAX Table Functions in Real-World Scenarios

For this case study, we'll be making use of a hypothetical dataset from a bookstore, and we will focus on using a combination of various DAX functions to solve two real-world scenarios: Analyzing best selling books and finding target customers. We will be implementing all DAX functions via the Power BI environment.

Let's assume we have three primary tables in our data model: Sales (Sales), Books (Books), and Customers (Customers).

Note: For this exercise, it's not important what these tables contain, we'll just assume that the relationships between these tables are already properly set up.

Scenario 1: Analyzing best selling books

We want to find the top 3 best selling books in terms of quantity sold.

Here is a DAX measure that does exactly that:

Top 3 Best Selling Books = 
VAR SellingBooks = SUMMARIZE(
    Sales,
    Books[BookTitle],
    "Total Quantity Sold", SUM(Sales[Quantity])
)
VAR Top3Books=TOPN(3, SellingBooks, [Total Quantity Sold], DESC)
RETURN
    CONCATENATEX(Top3Books, [BookTitle], ",")

Where:

  • SUMMARIZE generates a summary table over a set of groups.
  • TOPN returns the top N rows of a table.
  • CONCATENATEX concatenates the result of an expression evaluated for each row in a table.

The returned string will contain the top selling book title first, followed by the second and third best selling book.

Scenario 2: Finding target customers

Assume the marketing department wants to target customers who have purchased at least 5 different books.

We will use the COUNTROWS and DISTINCT functions, which allow us to manipulate data in a table and return the count of rows that are unique in a column, respectively.

Here's the DAX formula:

Target Customers =
VAR PurchasedBookCount = SUMMARIZE(
    Sales, 
    Customers[CustomerName], 
    "Distinct Book Count", COUNTROWS(DISTINCT(Sales[BookID]))
)
VAR TargetedCustomers = FILTER(
    PurchasedBookCount, 
    [Distinct Book Count] >= 5
)
RETURN
    CONCATENATEX(TargetedCustomers , Customers[CustomerName], ",")

Where:

  • FILTER allows us to reduce the number of rows in the table where the conditions that follow the function are met.

The return value will be a string, which contains the names of the target customers.

In this case study, by using DAX table functions, we are able to analyze complex real-world scenarios.