PowerBI Notebook
PowerBI sits at the presentation layer of the DA workflow. SQL and Python pull and analyze the data. PowerBI turns the results into interactive dashboards your team can use. This notebook covers the three skills that separate a strong PowerBI analyst from someone who only drags visuals onto a canvas.
For the data prep work that happens before PowerBI, see SQL Notebook and Pandas Notebook.
All examples use the parking system dataset: a fact table parking_transactions with dimension tables dim_station, dim_date, and dim_payment.
Part 1 — Data Modeling
Getting the data model right is the most important step. A bad model makes DAX harder to write, reports slower to run, and results easier to get wrong.
Star Schema
The star schema is the standard PowerBI data model. One central fact table connects to several dimension tables.
dim_date ──────────┐
dim_station ───────┤──── fact_parking_transactions
dim_payment_method ┘
| Table type | Contains | Example |
|---|---|---|
| Fact table | Numeric measures, foreign keys | parking_transactions: amount, duration, station_id, date_id |
| Dimension table | Descriptive attributes | dim_station: station_name, city, station_type |
Why not one flat table? A flat table (everything joined into one) works for small datasets, but it causes problems at scale: duplicate dimension data makes the file huge, and DAX filter behavior gets unpredictable. A star schema keeps the model clean and fast.
Relationships
Relationships tell PowerBI how the tables connect. Drag from the foreign key (in the fact table) to the primary key (in the dimension table) to create one.
| Setting | Recommended value | Why |
|---|---|---|
| Cardinality | Many-to-one (fact → dimension) | Each transaction links to one station; one station has many transactions |
| Cross-filter direction | Single | Filters should flow from dimension → fact. Bi-directional causes ambiguity and slow queries |
| Active / Inactive | One active relationship per pair of tables | If two tables have several relationships, keep all but one inactive. Use USERELATIONSHIP in DAX to switch them on |
Calculated Columns vs Measures
This is the most important modeling choice in DAX.
| Calculated Column | Measure | |
|---|---|---|
| Evaluated | Row by row at refresh time | At query time, based on the current filters |
| Stored | In the model (makes the file bigger) | Not stored |
| Use for | Categorizing, row-level labels | Aggregations, KPIs, anything that changes by filter |
| Example | duration_category = IF(duration > 120, "Long", "Short") |
Total Revenue = SUM(amount) |
Default rule: use a measure unless you specifically need a row-level value to filter or group by. Calculated columns that contain aggregations (SUM, AVERAGE) are a common mistake — that logic belongs in a measure.
Date Table
Every PowerBI model that uses time intelligence needs a dedicated date dimension table. The dates must be continuous with no gaps.
dim_date
├── date (primary key, Date type)
├── year
├── quarter_num
├── quarter_label (e.g. "2025 Q1")
├── month_num
├── month_name
├── week_num
├── day_of_week
└── is_weekend
Mark it as a Date Table in PowerBI (Table Tools → Mark as Date Table) and link it to every date column in the fact table. Without this step, time intelligence DAX functions will not work right.
Part 2 — DAX Syntax
DAX (Data Analysis Expressions) is the formula language for measures and calculated columns. Every formula runs inside a filter context — the set of filters from slicers, visual filters, and the row at the time the formula runs.
Basic Aggregations
Total Revenue = SUM(fact_parking_transactions[amount])
Avg Amount = AVERAGE(fact_parking_transactions[amount])
Transaction Count = COUNTROWS(fact_parking_transactions)
Unique Stations = DISTINCTCOUNT(fact_parking_transactions[station_id])
Max Amount = MAX(fact_parking_transactions[amount])
COUNTROWS counts rows in a table. COUNT counts non-blank values in a column. For most row-counting cases, COUNTROWS is clearer.
CALCULATE — Modify the Filter Context
CALCULATE is the most important DAX function. It runs an expression inside a changed filter context.
-- revenue from hourly parking only
Hourly Revenue =
CALCULATE(
SUM(fact_parking_transactions[amount]),
dim_payment[parking_type] = "hourly"
)
-- revenue from the previous month
Revenue Last Month =
CALCULATE(
SUM(fact_parking_transactions[amount]),
DATEADD(dim_date[date], -1, MONTH)
)
-- revenue ignoring the station filter (remove one filter)
All Stations Revenue =
CALCULATE(
SUM(fact_parking_transactions[amount]),
ALL(dim_station)
)
CALCULATE(expression, filter1, filter2, ...) adds or replaces filters. ALL() removes filters from a table or column — useful when you compute a share of a total.
FILTER — Row-Level Filtering
-- revenue from stations in Taipei only
Taipei Revenue =
CALCULATE(
SUM(fact_parking_transactions[amount]),
FILTER(dim_station, dim_station[city] = "Taipei")
)
-- count transactions over 500
High Value Transactions =
COUNTROWS(
FILTER(fact_parking_transactions, fact_parking_transactions[amount] > 500)
)
FILTER returns a table — use it inside CALCULATE or other table functions. For a simple "column equals value" filter, pass the condition directly to CALCULATE instead of using FILTER. It runs faster.
Safe Division — DIVIDE
-- AVOID: crashes if the denominator is 0
Revenue per Visit = [Total Revenue] / [Transaction Count]
-- CORRECT: returns blank (or your own fallback) when the denominator is 0
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count])
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count], 0) -- return 0 instead of blank
Always use DIVIDE instead of / for any measure where the denominator could be zero.
IF and SWITCH
-- IF: two outcomes
Revenue Tier =
IF([Total Revenue] > 100000, "High", "Low")
-- IF with three outcomes (nested)
Revenue Tier =
IF([Total Revenue] > 100000, "High",
IF([Total Revenue] > 50000, "Medium", "Low"))
-- SWITCH: cleaner for several conditions
Revenue Tier =
SWITCH(
TRUE(),
[Total Revenue] > 100000, "High",
[Total Revenue] > 50000, "Medium",
"Low" -- default
)
SWITCH(TRUE(), condition1, result1, condition2, result2, ..., default) is the DAX version of CASE WHEN. Use it instead of nested IF when you have three or more branches.
Variables — VAR / RETURN
MoM Growth % =
VAR current_month = SUM(fact_parking_transactions[amount])
VAR prev_month = CALCULATE(
SUM(fact_parking_transactions[amount]),
DATEADD(dim_date[date], -1, MONTH)
)
RETURN
DIVIDE(current_month - prev_month, prev_month)
VAR stores an in-between result. RETURN says what the measure outputs. Variables:
- Make complex measures readable — name each step.
- Speed things up — each variable is calculated once, not every time it is used.
- Let you reuse a value inside the same measure.
Time Intelligence
Time intelligence functions need a properly set up date table (see Data Modeling above).
-- year-to-date revenue
Revenue YTD =
TOTALYTD(SUM(fact_parking_transactions[amount]), dim_date[date])
-- same period last year
Revenue SPLY =
CALCULATE(
SUM(fact_parking_transactions[amount]),
SAMEPERIODLASTYEAR(dim_date[date])
)
-- month-over-month % change
MoM Growth % =
VAR curr = SUM(fact_parking_transactions[amount])
VAR prev = CALCULATE(
SUM(fact_parking_transactions[amount]),
DATEADD(dim_date[date], -1, MONTH)
)
RETURN DIVIDE(curr - prev, prev)
-- rolling 3-month average
Rolling 3M Avg =
AVERAGEX(
DATESINPERIOD(dim_date[date], LASTDATE(dim_date[date]), -3, MONTH),
[Total Revenue]
)
| Function | What it returns |
|---|---|
TOTALYTD |
Running total from Jan 1 to the current date |
TOTALQTD |
Running total from the start of the quarter |
TOTALMTD |
Running total from the start of the month |
SAMEPERIODLASTYEAR |
The same date range, one year earlier |
DATEADD |
A date range shifted by n periods |
DATESINPERIOD |
A date range of a given length, ending on a date |
Iterator Functions — SUMX, AVERAGEX
-- calculate revenue per transaction, then sum (row-level work)
Weighted Revenue =
SUMX(
fact_parking_transactions,
fact_parking_transactions[amount] * fact_parking_transactions[duration_mins]
)
-- average revenue only for transactions > 100
Avg High Value =
AVERAGEX(
FILTER(fact_parking_transactions, fact_parking_transactions[amount] > 100),
fact_parking_transactions[amount]
)
SUMX(table, expression) walks every row of a table, calculates the expression for each row, then adds them up. Use iterator functions when the work needs row-level logic before the total.
RELATED and RELATEDTABLE
-- in a calculated column: pull city from the related dimension table
Station City = RELATED(dim_station[city])
-- in a measure: count related rows from another table
Station Visit Count =
COUNTROWS(RELATEDTABLE(fact_parking_transactions))
RELATED follows a many-to-one link (from fact to dimension). RELATEDTABLE follows a one-to-many link (from dimension to fact).
RANKX
-- rank each station by total revenue (1 = highest)
Station Revenue Rank =
RANKX(
ALL(dim_station),
[Total Revenue],
,
DESC,
Dense
)
RANKX(table, expression, value, order, ties) — ALL(dim_station) removes the current filter so every station is ranked against all the others.
Part 3 — Visualization Selection Guide
PowerBI visuals are set up through the interface, not in code. The skill is picking the right visual for the question — the same idea as the Visualization Selection Guide for Python charts.
Selection Table
| Your question | Visual |
|---|---|
| How does a metric change over time? | Line chart |
| How do categories compare on one metric? | Bar / Column chart |
| What is one KPI value? | Card |
| How does performance compare to a target? | Gauge / KPI visual |
| What share does each category have? | Donut chart (≤ 5 categories) |
| How are two numeric variables linked? | Scatter chart |
| How do values change across a grid (station × month)? | Matrix |
| Where are values spread on a map? | Map |
| What does the detailed breakdown look like? | Table / Matrix |
Line Chart — Trend Over Time
Use when: you want to show how a measure changes over a continuous time axis.
Avoid when: you have fewer than 4–5 time points — the line suggests a smooth flow that does not exist. Use a column chart instead.
Add a constant line for targets or averages: Analytics pane → Constant line. Use the secondary y-axis when you put two measures with different scales on the same chart (for example, revenue and transaction count).
Bar / Column Chart — Category Comparison
Use when: you want to compare a measure across separate categories (stations, parking types, payment methods).
- Column chart: categories on the x-axis — best when category labels are short.
- Bar chart: categories on the y-axis — best when category labels are long.
- Clustered: compare sub-categories side by side.
- Stacked: show breakdown and total at the same time.
Sort bars by value, not alphabetically, unless the category has a natural order (months, age groups).
Avoid when: you have more than 12–15 categories — use a table or filter to top N instead.
Card — Single KPI
Use when: you want to highlight one number prominently (total revenue, visit count, average amount).
Always pair a card with context: add a label that shows the comparison ("vs last month: +12%") or a target. A number with no context is not an insight.
Gauge — Performance vs Target
Use when: you want to show progress toward a specific target (monthly revenue vs budget).
Avoid when: you have several metrics to compare — a bar chart of actuals vs targets is easier to read for more than 2–3 metrics.
Donut / Pie Chart — Composition
Use when: you want to show the share of a total with 5 or fewer categories, and the audience cares about share more than exact values.
Avoid when: you have more than 5 categories, you need to compare values precisely, or you want to show change over time. A stacked bar chart handles all three cases better.
Scatter Chart — Link Between Two Measures
Use when: you want to check whether two numeric measures move together across categories (for example, does station size predict revenue?).
Set the Play axis to a date field to animate the scatter across time — useful for showing how the link changes month by month.
Matrix — Cross-Tabulation
Use when: you want to show a measure broken down by two dimensions at once (station × month, city × payment method).
Turn on conditional formatting (background color or data bars) so patterns are easy to spot — a plain number grid is hard to read.
Map — Geographic Distribution
Use when: location matters (which cities or regions have the most revenue).
Use Filled Map for areas (city, district), and Bubble Map for point locations (single stations). Make sure the location column is set to the right type (Data view → Column tools → Data category → City / Country / etc.).
Common Mistakes
1. Putting Aggregations in Calculated Columns
-- WRONG: calculated columns cannot aggregate across rows
Revenue Column = SUM(fact_parking_transactions[amount]) -- returns an error or the wrong value
-- CORRECT: aggregations belong in measures
Total Revenue = SUM(fact_parking_transactions[amount])
2. Using Bi-directional Relationships by Default
Bi-directional cross-filtering lets filters flow both ways between tables. It is rarely needed in a star schema, and it causes ambiguous filter paths and slower queries. Use single-direction (dimension → fact) unless you have a specific reason.
3. Missing or Misconfigured Date Table
Time intelligence functions (TOTALYTD, SAMEPERIODLASTYEAR, etc.) need a continuous date table marked as a Date Table. Missing dates cause wrong totals. Check that the date table has no gaps and covers the full range of your data.
4. Dividing with / Instead of DIVIDE
-- WRONG: a blank denominator causes an error
Revenue per Visit = [Total Revenue] / [Transaction Count]
-- CORRECT
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count])
5. Overusing Donut / Pie Charts
More than 5 slices become unreadable. Use a bar chart sorted by value instead — bar length is easier to compare than angle.
6. Forgetting ALL() in Share-of-Total Measures
-- WRONG: the denominator is also filtered, so it always returns 100%
Revenue Share = DIVIDE([Total Revenue], [Total Revenue])
-- CORRECT: remove filters from the denominator
Revenue Share =
DIVIDE(
[Total Revenue],
CALCULATE([Total Revenue], ALL(dim_station))
)