Get Data > Transform and Clean Data > Check Relationships > Visualize Data
UPDATE January 2023: Microsoft Learn have recently updated their explanation of star schema in Power BI with a much more useful explanation that can be found here:
https://learn.microsoft.com/en-us/power-bi/guidance/star-schema
Note that we have one row for each transaction; in this case a transaction is every attendee at a training event. This means that each event is listed multiple times in the Fact table, as we had multiple employees in attendance. It also means that each employee can be listed multiple times in the Fact table, as an employee can attend multiple training sessions.
The most common data model is a star schema, and consists of:
- Fact table - usually one per report (though it's possible to have multiple fact tables in one data model) that is the central table of transactions (sales, events, logs, requests, audits, etc).
- Dimension tables - as many as needed to store commonalities, demographic or dimension data about the transactions (customers, locations, dates, leave type, department, etc)
Note that Dimension tables have nothing to do with dimensions of objects (as pictured), except that in both cases you must ensure you have as many dimensions as needed to paint a clear picture of your desired result.
In a typical star schema, we create a relationship between the Fact table and each Dimension table based on key identifiers, or columns that both tables have in common. It is not always straightforward which table is the Fact table and which tables you need to use for Dimension tables, so let's start with a few examples.
Example 1: Employee Training
We will start by using employee training as an example, where you want to track the number and type of training that each employee has attended in the past month.
Fact Table: Training Attendance
Your Fact table might look something like this:
FactTrainingAttendance
EmpID | CourseID | Date | Result |
1 | PB1 | 24-Jun-2020 | 95 |
2 | EX1 | 20-Jun-2020 | 70 |
1 | PB2 | 30-Jun-2020 | 80 |
3 | EX1 | 20-Jun-2020 | 80 |
Note that we have one row for each transaction; in this case a transaction is every attendee at a training event. This means that each event is listed multiple times in the Fact table, as we had multiple employees in attendance. It also means that each employee can be listed multiple times in the Fact table, as an employee can attend multiple training sessions.
I can see at a glance that 3 employees have attended three different courses, but I do not have any specific information about those employees or the courses.
Dimension Tables: DimEmployee, DimCourse
What if I want to know the gender, job title or age group of those employees? What if I want to know the category of the course? In order to capture and report on this information, I will add two new Dimension tables to my model. The reason for using Dimension tables here is that the information in these tables stays static across many transactions:
- An employee does not change their birth date or gender each time they attend a course.
- A course does not change categories each time it is run (an Excel course is always in the Excel category, a SharePoint course will always be SharePoint category).
DimEmployee
EmpID | Name | Date of Birth | Gender |
1 | John | 24-Jun-1960 | M |
2 | Paul | 20-Aug-1988 | M |
3 | Mary | 14-Feb-1975 | F |
DimCourse
CourseID | Course Name | Course Category | Level |
PB1 | Power BI Essentials | Power BI | Beginner |
PB2 | Power BI Next Level | Power BI | Intermediate |
EX1 | Excel Essentials | Excel | Beginner |
Related Columns
Once you have identified the Fact and Dimension tables, you must identify the key identifying columns that relate the two tables. In Power BI we can only select one column from each table to make this connection. In this example, how can we relate the FactTrainingAttendance table to the DimEmployee table? How can we relate the FactTrainingAttendance table to the DimCourse table?
FactTrainingAttendance[EmpID] relates to DimEmployee[EmpID]
FactTrainingAttendance[CourseID] relates to DimCourse[CourseID]
Example 2: Incident Reporting
Let's use incident reporting for our second example describing Fact and Dimension tables, since it differs a bit from the typical sales examples already posted on the internet.
Fact table: Incident Reports
If you are reporting on incidents that have been reported over the past month, you might see a FactIncidentReports table with columns such as:
- IncidentID
- IncidentDate
- EmployeeID
- ReportID
Dimension tables: DimDate, DimEmployee, DimIncident
From that small amount of information, I can use the Dimensional (demographic/static) data that I already have, to find:
- Month
- Employee name, gender, age
- Incident type, severity
This information can be pulled from the Dimension tables:
DimDate
- Date
- Year
- Quarter
- Month
- Month Number
- Day of Week
DimEmployee
- EmployeeID
- Employee Name
- Gender
- Date of Birth
DimIncident
- IncidentID
- Incident Name
- Incident Type
- Severity
Related Columns
If we focus on the DimEmployee table, I only need to pull in data for each employee ONCE, to save space in my data model. We can use the EmployeeID column to connect the DimEmployee table, and all the data in it, to every incident that employee reports in the FactIncidentReports table, which may be MANY over the course of a month or year.
FactIncidentReports[IncidentID] relates to DimIncident[IncidentID]
FactIncidentReports[IncidentDate] relates to DimDate[Date]
FactIncidentReports[EmployeeID] relates to DimEmployee[EmployeeID]
Star Schema
Now that we've seen a few examples, you can understand the common identifying features of a Star Schema data model:
- a Fact table - one row per transaction, few columns to avoid repeating information
- a Dimension table - holds static details that relate to a key identifier in the Fact table, saves space in data model by avoiding duplicate information or double data entry
Once you have identified your Fact and Dimension tables, you are now ready for the next step - creating relationships between them. In the next post we will look at how to create these relationships and something called Cardinality.