PowerBI Notebook

PowerBI sits at the presentation layer of the DA workflow: SQL and Python handle extraction and analysis, PowerBI turns the results into interactive dashboards for stakeholders. This notebook covers the three skills that separate a capable PowerBI analyst from someone who just drags visuals onto a canvas.

For the data preparation that happens before PowerBI, see SQL Notebook and Pandas Notebook.

All examples use the parking system dataset: fact table parking_transactions with dimension tables dim_station, dim_date, and dim_payment.


Part 1 — Data Modeling

Getting the data model right is the most important step. A bad model makes DAX harder to write, reports slower to run, and results easier to get wrong.

Star Schema

The star schema is the standard PowerBI data model. One central fact table connects to multiple dimension tables.

dim_date ──────────┐
dim_station ───────┤──── fact_parking_transactions
dim_payment_method ┘
Table type Contains Example
Fact table Numeric measures, foreign keys parking_transactions: amount, duration, station_id, date_id
Dimension table Descriptive attributes dim_station: station_name, city, station_type

Why not one flat table? A flat table (all columns joined together) works for small datasets but causes problems at scale: duplicate dimension data inflates file size, and DAX filter propagation becomes unpredictable. Star schema keeps the model clean and performant.

Relationships

Relationships tell PowerBI how tables connect. Drag from the foreign key (fact) to the primary key (dimension) to create one.

Setting Recommended value Why
Cardinality Many-to-one (fact → dimension) Each transaction links to one station; one station appears in many transactions
Cross-filter direction Single Filters flow from dimension → fact. Bi-directional causes ambiguity and performance issues
Active / Inactive One active relationship per table pair Multiple relationships between the same tables must be inactive; use USERELATIONSHIP in DAX to activate them

Calculated Columns vs Measures

This is the most important modeling decision in DAX.

Calculated Column Measure
Evaluated Row by row at refresh At query time, based on filter context
Stored In the model (increases file size) Not stored
Use for Categorisation, row-level labels Aggregations, KPIs, anything that changes by filter
Example duration_category = IF(duration > 120, "Long", "Short") Total Revenue = SUM(amount)

Default rule: use a measure unless you specifically need a row-level value to filter or group by. Calculated columns that contain aggregations (SUM, AVERAGE) are a common mistake — that logic belongs in a measure.

Date Table

Every PowerBI model with time intelligence needs a dedicated date dimension table covering a continuous range with no gaps.

dim_date
├── date (primary key, Date type)
├── year
├── quarter_num
├── quarter_label  (e.g. "2025 Q1")
├── month_num
├── month_name
├── week_num
├── day_of_week
└── is_weekend

Mark it as a Date Table in PowerBI (Table Tools → Mark as Date Table) and link it to every date column in the fact table. Without this, time intelligence DAX functions will not work correctly.


Part 2 — DAX Syntax

DAX (Data Analysis Expressions) is the formula language for measures and calculated columns. Every formula evaluates in a filter context — the set of filters applied by slicers, visual filters, and row context at the time of evaluation.

Basic Aggregations

Total Revenue    = SUM(fact_parking_transactions[amount])
Avg Amount       = AVERAGE(fact_parking_transactions[amount])
Transaction Count = COUNTROWS(fact_parking_transactions)
Unique Stations  = DISTINCTCOUNT(fact_parking_transactions[station_id])
Max Amount       = MAX(fact_parking_transactions[amount])

COUNTROWS counts rows in a table. COUNT counts non-blank values in a column. For most row-counting use cases, COUNTROWS is clearer.

CALCULATE — Modify Filter Context

CALCULATE is the most important DAX function. It evaluates an expression in a modified filter context.

-- revenue from hourly parking only
Hourly Revenue =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    dim_payment[parking_type] = "hourly"
)

-- revenue from the previous month
Revenue Last Month =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    DATEADD(dim_date[date], -1, MONTH)
)

-- revenue ignoring the station filter (remove one filter)
All Stations Revenue =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    ALL(dim_station)
)

CALCULATE(expression, filter1, filter2, ...) adds or overrides filters. ALL() removes filters from a table or column — useful for computing share of total.

FILTER — Row-Level Filtering

-- revenue from stations in Taipei only
Taipei Revenue =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    FILTER(dim_station, dim_station[city] = "Taipei")
)

-- count transactions over 500
High Value Transactions =
COUNTROWS(
    FILTER(fact_parking_transactions, fact_parking_transactions[amount] > 500)
)

FILTER returns a table — use it inside CALCULATE or other table functions. For simple column equality filters, pass the condition directly to CALCULATE instead of using FILTER (it's faster).

Safe Division — DIVIDE

-- AVOID: crashes if denominator is 0
Revenue per Visit = [Total Revenue] / [Transaction Count]

-- CORRECT: returns blank (or a specified fallback) when denominator is 0
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count])
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count], 0)  -- return 0 instead of blank

Always use DIVIDE instead of / for any measure that could have a zero denominator.

IF and SWITCH

-- IF: two outcomes
Revenue Tier =
IF([Total Revenue] > 100000, "High", "Low")

-- IF with three outcomes (nested)
Revenue Tier =
IF([Total Revenue] > 100000, "High",
    IF([Total Revenue] > 50000, "Medium", "Low"))

-- SWITCH: cleaner for multiple conditions
Revenue Tier =
SWITCH(
    TRUE(),
    [Total Revenue] > 100000, "High",
    [Total Revenue] > 50000,  "Medium",
    "Low"                              -- default
)

SWITCH(TRUE(), condition1, result1, condition2, result2, ..., default) is the DAX equivalent of CASE WHEN. Prefer it over nested IF for three or more branches.

Variables — VAR / RETURN

MoM Growth % =
VAR current_month = SUM(fact_parking_transactions[amount])
VAR prev_month    = CALCULATE(
                       SUM(fact_parking_transactions[amount]),
                       DATEADD(dim_date[date], -1, MONTH)
                   )
RETURN
    DIVIDE(current_month - prev_month, prev_month)

VAR stores an intermediate result. RETURN specifies what the measure outputs. Variables: - Make complex measures readable — name each step. - Improve performance — each variable is calculated once, not repeatedly. - Enable reuse within the same measure.

Time Intelligence

Time intelligence functions require a properly configured date table (see Data Modeling section).

-- year-to-date revenue
Revenue YTD =
TOTALYTD(SUM(fact_parking_transactions[amount]), dim_date[date])

-- same period last year
Revenue SPLY =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    SAMEPERIODLASTYEAR(dim_date[date])
)

-- month-over-month % change
MoM Growth % =
VAR curr = SUM(fact_parking_transactions[amount])
VAR prev = CALCULATE(
               SUM(fact_parking_transactions[amount]),
               DATEADD(dim_date[date], -1, MONTH)
           )
RETURN DIVIDE(curr - prev, prev)

-- rolling 3-month average
Rolling 3M Avg =
AVERAGEX(
    DATESINPERIOD(dim_date[date], LASTDATE(dim_date[date]), -3, MONTH),
    [Total Revenue]
)
Function What it returns
TOTALYTD Cumulative total from Jan 1 to current date
TOTALQTD Cumulative total from start of quarter
TOTALMTD Cumulative total from start of month
SAMEPERIODLASTYEAR Same date range, one year back
DATEADD Date range shifted by n periods
DATESINPERIOD Date range of a specified length ending on a date

Iterator Functions — SUMX, AVERAGEX

-- calculate revenue per transaction, then sum (row-level calculation)
Weighted Revenue =
SUMX(
    fact_parking_transactions,
    fact_parking_transactions[amount] * fact_parking_transactions[duration_mins]
)

-- average revenue only for transactions > 100
Avg High Value =
AVERAGEX(
    FILTER(fact_parking_transactions, fact_parking_transactions[amount] > 100),
    fact_parking_transactions[amount]
)

SUMX(table, expression) iterates over each row of a table, evaluates the expression per row, then aggregates. Use iterator functions when the calculation requires row-level logic before aggregating.

RELATED and RELATEDTABLE

-- in a calculated column: pull city from the related dimension table
Station City = RELATED(dim_station[city])

-- in a measure: count related rows from another table
Station Visit Count =
COUNTROWS(RELATEDTABLE(fact_parking_transactions))

RELATED follows a many-to-one relationship (from fact to dimension). RELATEDTABLE follows a one-to-many relationship (from dimension to fact).

RANKX

-- rank each station by total revenue (1 = highest)
Station Revenue Rank =
RANKX(
    ALL(dim_station),
    [Total Revenue],
    ,
    DESC,
    Dense
)

RANKX(table, expression, value, order, ties)ALL(dim_station) removes the current filter so every station is ranked against all others.


Part 3 — Visualization Selection Guide

PowerBI visuals are configured through the interface, not code. The skill here is choosing the right visual for the question — the same principle as the Visualization Selection Guide for Python charts.

Selection Table

Your question Visual
How does a metric change over time? Line chart
How do categories compare on one metric? Bar / Column chart
What is a single KPI value? Card
How does performance compare to a target? Gauge / KPI visual
What share does each category contribute? Donut chart (≤ 5 categories)
How do two numeric variables relate? Scatter chart
How do values vary across a grid (station × month)? Matrix
Where are values distributed geographically? Map
What does the detailed breakdown look like? Table / Matrix

Line Chart — Trend Over Time

Use when: showing how a measure changes over a continuous time axis.

Avoid when: you have fewer than 4–5 time points — the line implies continuity that does not exist. Use a column chart instead.

Add a constant line for targets or averages: Analytics pane → Constant line. Use the secondary y-axis when overlaying two measures with different scales (e.g., revenue and transaction count).

Bar / Column Chart — Categorical Comparison

Use when: comparing a measure across discrete categories (stations, parking types, payment methods).

  • Column chart: categories on the x-axis — best when category labels are short.
  • Bar chart: categories on the y-axis — best when category labels are long.
  • Clustered: compare sub-categories side by side.
  • Stacked: show composition and total simultaneously.

Sort bars by value, not alphabetically, unless the category has a natural order (months, age groups).

Avoid when: you have more than 12–15 categories — use a table or filter to top N instead.

Card — Single KPI

Use when: highlighting one number prominently (total revenue, visit count, average amount).

Always pair a card with context: add a reference label showing the comparison period ("vs last month: +12%") or a target. A number without context is not an insight.

Gauge — Performance vs Target

Use when: showing progress toward a specific target (monthly revenue vs budget).

Avoid when: you have multiple metrics to compare — a bar chart of actuals vs targets is easier to read for more than 2–3 metrics.

Donut / Pie Chart — Composition

Use when: showing the share of a total with 5 or fewer categories, and the audience cares about proportions rather than exact values.

Avoid when: you have more than 5 categories, you need to compare values precisely, or you want to show change over time. A stacked bar chart handles all three cases better.

Scatter Chart — Relationship Between Two Measures

Use when: exploring whether two numeric measures are correlated across categories (e.g., does station size predict revenue?).

Set the Play axis to a date field to animate the scatter across time — useful for showing how the relationship evolves.

Matrix — Cross-Tabulation

Use when: showing a measure broken down by two dimensions simultaneously (station × month, city × payment method).

Enable conditional formatting (background colour or data bars) to make patterns visible at a glance — a plain number grid is hard to read.

Map — Geographic Distribution

Use when: the geographic dimension matters (which cities or regions have the most revenue).

Use Filled Map for regions (city, district), Bubble Map for point locations (individual stations). Ensure the location column is categorised correctly (Data view → Column tools → Data category → City / Country / etc.).


Common Mistakes

1. Putting Aggregations in Calculated Columns

-- WRONG: calculated columns cannot aggregate across rows
Revenue Column = SUM(fact_parking_transactions[amount])   -- returns error or wrong value

-- CORRECT: aggregations belong in measures
Total Revenue = SUM(fact_parking_transactions[amount])

2. Using Bi-directional Relationships by Default

Bi-directional cross-filtering allows filters to flow both ways between tables. It is rarely needed in a star schema and causes ambiguous filter paths and slower queries. Use single-direction (dimension → fact) unless there is a specific reason.

3. Missing or Misconfigured Date Table

Time intelligence functions (TOTALYTD, SAMEPERIODLASTYEAR, etc.) require a continuous date table marked as a Date Table. Missing dates in the range cause incorrect totals. Verify the date table has no gaps and covers the full data range.

4. Dividing with / Instead of DIVIDE

-- WRONG: blank denominator causes an error
Revenue per Visit = [Total Revenue] / [Transaction Count]

-- CORRECT
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count])

5. Overusing Donut / Pie Charts

More than 5 slices become unreadable. Replace with a bar chart sorted by value — bar length is easier to compare than angle.

6. Forgetting ALL() in Share-of-Total Measures

-- WRONG: denominator is also filtered, always returns 100%
Revenue Share = DIVIDE([Total Revenue], [Total Revenue])

-- CORRECT: remove filters from the denominator
Revenue Share =
DIVIDE(
    [Total Revenue],
    CALCULATE([Total Revenue], ALL(dim_station))
)