Search

Search this blog:

Search This Blog

It's Complicated: Relationships in Power BI Part 2 (Cardinality)


Get Data > Transform and Clean Data > Check Relationships > Visualize Data

This is the general order of operations for building a successful report in Power BI desktop. In this series, I want to discuss relationships in Power BI and why it's important for you to have a basic understanding of the underlying Power BI data model. 

In It's Complicated: Relationships in Power BI Part 1 (Fact vs Dimension) we discussed how to identify Fact and Dimension tables in a typical star schema: 

  • Fact table - usually one per report (though it's possible to have multiple fact tables in one data model) that is the central table of transactions (sales, events, logs, requests, audits, etc). 
  • Dimension tables - as many as needed to store commonalities, demographic or dimension data about the transactions (customers, locations, dates, leave type, department, etc)
Once you have identified the Fact and Dimension tables, you must identify the key identifying columns that relate the two tables. In Power BI we can only select one column from each table to make this connection. In Part 1 of this series, we chose the key identifying columns based on common sense and common information between the tables. 

In this post, we are going to look at the Cardinality of the relationship between two tables, and the role it plays in choosing the key identifying columns. 

Cardinality

Cardinality is defined as 'the number of elements in a set'. In order for this to make sense with our data, we must think of our set as a single value for one of our key identifying columns. With most data models, you have three main choices for cardinality; 
  • 1 to 1
  • 1 to many / many to 1
  • many to many
Note I have crossed off many to many - this cardinality is not available in Excel and is not recommended in Power BI unless you are an expert with understanding relationships and really know you need it.

A typical star schema, has 1 to many cardinality for the relationships between the Fact table and each Dimension table. 

If we look back at our example from the previous post: 

FactTrainingAttendance

 EmpIDCourseID Date Result 
 1PB1 24-Jun-2020 95
 2EX1 20-Jun-2020 70
 1PB2 30-Jun-2020 80 
 3EX120-Jun-202080 

DimCourse

CourseIDCourse NameCourse CategoryLevel
 PB1Power BI EssentialsPower BIBeginner
 PB2Power BI Next LevelPower BI Intermediate
 EX1Excel EssentialsExcelBeginner

We can see that CourseID exists in both tables. In order to be able to use this as a key relationship column, we need to check a few things: 

  • Does the data type match? Yes! Both are text.
  • Are they in the same format? Yes! Both have two letter abbreviation followed by number with no spaces.
  • Is there a one to one or one to many cardinality? Hmm, what does that mean again? 

Remember we defined cardinality as the number of items in a set. I said you need to think of your set as a single unique value for the column. We have three sets to look at here; PB1, PB2 and EX1. In order to see if this relationship follows the cardinality we need, I must count the number of times each item appears in each table. 

DimCourse

CourseIDNumber of Times in FactTrainingAttendanceNumber of Times in DimCourseUnique in Dim table?
 PB111Yes
 PB21Yes
 EX12 (many)1Yes

 Note: You can use the Column Distribution option in Power Query Editor (tick the box in the View tab i the ribbon to turn on Column Distribution) to see the cardinality of each item for the first 1000 rows of your data. 

FactTrainingAttendance: CourseID Cardinality = many

In the image above, you can see that CourseID has more distinct values than unique values. This means that not every CourseID is unique in the Fact table, and that some CourseIDs appear multiple times in the FactTrainingAttendance table. The cardinality for CourseID in the fact table is many.

DimCourse: CourseID Cardinality = 1

In the DimCourse table, we can see that the number of distinct values is equal to the number of unique values. This means that every CourseID in the DimCourse table is unique and only appears one time in the Dimension table. The cardinality for CourseID in the dimension table is 1.

DimCourse to FactTrainingAttendance: Cardinality = 1 to many

Therefore if we use CourseID to relate DimCourse to FactTrainingAttendance, our cardinality will be 1 to many. Let's check that list again: 
  • Does the data type match? Yes! Both are text.
  • Are they in the same format? Yes! Both have two letter abbreviation followed by number with no spaces.
  • Is there a one to one or one to many cardinality? Yes! Cardinality is 1 to many.

BONUS: Choosing the right key

What if we change our data slightly and add a new column to the fact table. Imagine our data now looks like this: 

FactTrainingAttendance

 EmpIDCourse Number Course CategoryDate Result 
 1PB24-Jun-2020 95
 2EX20-Jun-2020 70
 1PB30-Jun-2020 80 
 31EX20-Jun-202080 

DimCourse

Course NumberCourse NameCourse CategoryLevel
 1Power BI EssentialsPBBeginner
 2Power BI Next LevelPB Intermediate
 1Excel EssentialsEXBeginner

Could we use Course Category and Course Category as our key columns for the relationship between these two tables? Let's review our checklist again: 

  • Does the data type match? Yes! Both are text.
  • Are they in the same format? Yes! Both spell out the entire category.
  • Is there a one to one or one to many cardinality? No! Cardinality is many to many.

Power BI category (PB) appears many times in both the Fact and the Dimension table. The key column in the dimension table must ALWAYS have a cardinality of one, meaning that every value in the key column of the dimension table must be unique. 

In this example, if we try to use Course Category as the key column for the dimension table, I would not know which Level to assign to the course on 24 Jun - I know that it's in the Power BI (PB) category, but when I check the DimCourse table for Course Category = PB, I have two choices; Beginner and Intermediate. Therefore, my results would be ambiguous. I have Power BI Essentials and Power BI Next Level to choose from.

If we use the CourseID, I know that the course on 24 Jun is Course Number 1. Again, Course Number 1 appears multiple times in the DimCourse table - again my results would be ambiguous. I have Power BI Essentials and Excel Essentials to choose from.

How do I know which course to choose? 

Power BI and Excel only allow us to use ONE column to create relationships between our tables, so I must create a single column unique CourseID. Looking at the above options:

Course Category as key gives - Power BI Essentials, Power BI Next Level

Course Number as key gives - Power BI Essentials, Excel Essentials

Power BI Essentials is the only course in both lists, so this must be the right choice. However, I can only know this if I look at BOTH Course Category AND Course Number.

So let's merge these into a single column in both tables: CourseID

FactTrainingAttendance

 EmpIDCourse Number Course CategoryDate Result CourseID
 1PB24-Jun-2020 95PB1
 2EX20-Jun-2020 70EX1
 1PB30-Jun-2020 80 PB2
 31EX20-Jun-202080 

EX1

DimCourse

Course NumberCourse NameCourse CategoryLevelCourseID
 1Power BI EssentialsPBBeginnerPB1
 2Power BI Next LevelPB IntermediatePB2
 1Excel EssentialsEXBeginnerEX1

If we use the new CourseID column as key, I know that the course on 24 Jun is CourseID PB1. PB1 is only in the DimCourse table once, so I know that it falls into Beginner Level. 

Conclusion

Make sure you understand your data well enough to verify that the relationships created in Power BI and Excel data models are correct. Fact tables should have cardinality of many. Dimension tables should have cardinality of one. Not all data models are as straightforward as this example, so reach out if you have any questions. 

If you're using Excel, this is all you need to worry about when creating relationships. If you are using Power BI, you have one more choice to make - Cross Filter direction. We will cover cross filter direction in Part 3 of the series. 

Custom Visual Review: Charticulator

This is not your ordinary custom visual - this is EVERY custom visual. Charticulator puts the power to design and develop custom visuals to ...