PowerBI Notebook

PowerBI sits at the presentation layer of the DA workflow. SQL and Python pull and analyze the data. PowerBI turns the results into interactive dashboards your team can use. This notebook covers the three skills that separate a strong PowerBI analyst from someone who only drags visuals onto a canvas.

For the data prep work that happens before PowerBI, see SQL Notebook and Pandas Notebook.

All examples use the parking system dataset: a fact table parking_transactions with dimension tables dim_station, dim_date, and dim_payment.


Part 1 — Data Modeling

Getting the data model right is the most important step. A bad model makes DAX harder to write, reports slower to run, and results easier to get wrong.

Star Schema

The star schema is the standard PowerBI data model. One central fact table connects to several dimension tables.

dim_date ──────────┐
dim_station ───────┤──── fact_parking_transactions
dim_payment_method ┘
Table type Contains Example
Fact table Numeric measures, foreign keys parking_transactions: amount, duration, station_id, date_id
Dimension table Descriptive attributes dim_station: station_name, city, station_type

Why not one flat table? A flat table (everything joined into one) works for small datasets, but it causes problems at scale: duplicate dimension data makes the file huge, and DAX filter behavior gets unpredictable. A star schema keeps the model clean and fast.

Relationships

Relationships tell PowerBI how the tables connect. Drag from the foreign key (in the fact table) to the primary key (in the dimension table) to create one.

Setting Recommended value Why
Cardinality Many-to-one (fact → dimension) Each transaction links to one station; one station has many transactions
Cross-filter direction Single Filters should flow from dimension → fact. Bi-directional causes ambiguity and slow queries
Active / Inactive One active relationship per pair of tables If two tables have several relationships, keep all but one inactive. Use USERELATIONSHIP in DAX to switch them on

Calculated Columns vs Measures

This is the most important modeling choice in DAX.

Calculated Column Measure
Evaluated Row by row at refresh time At query time, based on the current filters
Stored In the model (makes the file bigger) Not stored
Use for Categorizing, row-level labels Aggregations, KPIs, anything that changes by filter
Example duration_category = IF(duration > 120, "Long", "Short") Total Revenue = SUM(amount)

Default rule: use a measure unless you specifically need a row-level value to filter or group by. Calculated columns that contain aggregations (SUM, AVERAGE) are a common mistake — that logic belongs in a measure.

Date Table

Every PowerBI model that uses time intelligence needs a dedicated date dimension table. The dates must be continuous with no gaps.

dim_date
├── date (primary key, Date type)
├── year
├── quarter_num
├── quarter_label  (e.g. "2025 Q1")
├── month_num
├── month_name
├── week_num
├── day_of_week
└── is_weekend

Mark it as a Date Table in PowerBI (Table Tools → Mark as Date Table) and link it to every date column in the fact table. Without this step, time intelligence DAX functions will not work right.


Part 2 — DAX Syntax

DAX (Data Analysis Expressions) is the formula language for measures and calculated columns. Every formula runs inside a filter context — the set of filters from slicers, visual filters, and the row at the time the formula runs.

Basic Aggregations

Total Revenue    = SUM(fact_parking_transactions[amount])
Avg Amount       = AVERAGE(fact_parking_transactions[amount])
Transaction Count = COUNTROWS(fact_parking_transactions)
Unique Stations  = DISTINCTCOUNT(fact_parking_transactions[station_id])
Max Amount       = MAX(fact_parking_transactions[amount])

COUNTROWS counts rows in a table. COUNT counts non-blank values in a column. For most row-counting cases, COUNTROWS is clearer.

CALCULATE — Modify the Filter Context

CALCULATE is the most important DAX function. It runs an expression inside a changed filter context.

-- revenue from hourly parking only
Hourly Revenue =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    dim_payment[parking_type] = "hourly"
)

-- revenue from the previous month
Revenue Last Month =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    DATEADD(dim_date[date], -1, MONTH)
)

-- revenue ignoring the station filter (remove one filter)
All Stations Revenue =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    ALL(dim_station)
)

CALCULATE(expression, filter1, filter2, ...) adds or replaces filters. ALL() removes filters from a table or column — useful when you compute a share of a total.

FILTER — Row-Level Filtering

-- revenue from stations in Taipei only
Taipei Revenue =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    FILTER(dim_station, dim_station[city] = "Taipei")
)

-- count transactions over 500
High Value Transactions =
COUNTROWS(
    FILTER(fact_parking_transactions, fact_parking_transactions[amount] > 500)
)

FILTER returns a table — use it inside CALCULATE or other table functions. For a simple "column equals value" filter, pass the condition directly to CALCULATE instead of using FILTER. It runs faster.

Safe Division — DIVIDE

-- AVOID: crashes if the denominator is 0
Revenue per Visit = [Total Revenue] / [Transaction Count]

-- CORRECT: returns blank (or your own fallback) when the denominator is 0
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count])
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count], 0)  -- return 0 instead of blank

Always use DIVIDE instead of / for any measure where the denominator could be zero.

IF and SWITCH

-- IF: two outcomes
Revenue Tier =
IF([Total Revenue] > 100000, "High", "Low")

-- IF with three outcomes (nested)
Revenue Tier =
IF([Total Revenue] > 100000, "High",
    IF([Total Revenue] > 50000, "Medium", "Low"))

-- SWITCH: cleaner for several conditions
Revenue Tier =
SWITCH(
    TRUE(),
    [Total Revenue] > 100000, "High",
    [Total Revenue] > 50000,  "Medium",
    "Low"                              -- default
)

SWITCH(TRUE(), condition1, result1, condition2, result2, ..., default) is the DAX version of CASE WHEN. Use it instead of nested IF when you have three or more branches.

Variables — VAR / RETURN

MoM Growth % =
VAR current_month = SUM(fact_parking_transactions[amount])
VAR prev_month    = CALCULATE(
                       SUM(fact_parking_transactions[amount]),
                       DATEADD(dim_date[date], -1, MONTH)
                   )
RETURN
    DIVIDE(current_month - prev_month, prev_month)

VAR stores an in-between result. RETURN says what the measure outputs. Variables: - Make complex measures readable — name each step. - Speed things up — each variable is calculated once, not every time it is used. - Let you reuse a value inside the same measure.

Time Intelligence

Time intelligence functions need a properly set up date table (see Data Modeling above).

-- year-to-date revenue
Revenue YTD =
TOTALYTD(SUM(fact_parking_transactions[amount]), dim_date[date])

-- same period last year
Revenue SPLY =
CALCULATE(
    SUM(fact_parking_transactions[amount]),
    SAMEPERIODLASTYEAR(dim_date[date])
)

-- month-over-month % change
MoM Growth % =
VAR curr = SUM(fact_parking_transactions[amount])
VAR prev = CALCULATE(
               SUM(fact_parking_transactions[amount]),
               DATEADD(dim_date[date], -1, MONTH)
           )
RETURN DIVIDE(curr - prev, prev)

-- rolling 3-month average
Rolling 3M Avg =
AVERAGEX(
    DATESINPERIOD(dim_date[date], LASTDATE(dim_date[date]), -3, MONTH),
    [Total Revenue]
)
Function What it returns
TOTALYTD Running total from Jan 1 to the current date
TOTALQTD Running total from the start of the quarter
TOTALMTD Running total from the start of the month
SAMEPERIODLASTYEAR The same date range, one year earlier
DATEADD A date range shifted by n periods
DATESINPERIOD A date range of a given length, ending on a date

Iterator Functions — SUMX, AVERAGEX

-- calculate revenue per transaction, then sum (row-level work)
Weighted Revenue =
SUMX(
    fact_parking_transactions,
    fact_parking_transactions[amount] * fact_parking_transactions[duration_mins]
)

-- average revenue only for transactions > 100
Avg High Value =
AVERAGEX(
    FILTER(fact_parking_transactions, fact_parking_transactions[amount] > 100),
    fact_parking_transactions[amount]
)

SUMX(table, expression) walks every row of a table, calculates the expression for each row, then adds them up. Use iterator functions when the work needs row-level logic before the total.

RELATED and RELATEDTABLE

-- in a calculated column: pull city from the related dimension table
Station City = RELATED(dim_station[city])

-- in a measure: count related rows from another table
Station Visit Count =
COUNTROWS(RELATEDTABLE(fact_parking_transactions))

RELATED follows a many-to-one link (from fact to dimension). RELATEDTABLE follows a one-to-many link (from dimension to fact).

RANKX

-- rank each station by total revenue (1 = highest)
Station Revenue Rank =
RANKX(
    ALL(dim_station),
    [Total Revenue],
    ,
    DESC,
    Dense
)

RANKX(table, expression, value, order, ties)ALL(dim_station) removes the current filter so every station is ranked against all the others.


Part 3 — Visualization Selection Guide

PowerBI visuals are set up through the interface, not in code. The skill is picking the right visual for the question — the same idea as the Visualization Selection Guide for Python charts.

Selection Table

Your question Visual
How does a metric change over time? Line chart
How do categories compare on one metric? Bar / Column chart
What is one KPI value? Card
How does performance compare to a target? Gauge / KPI visual
What share does each category have? Donut chart (≤ 5 categories)
How are two numeric variables linked? Scatter chart
How do values change across a grid (station × month)? Matrix
Where are values spread on a map? Map
What does the detailed breakdown look like? Table / Matrix

Line Chart — Trend Over Time

Use when: you want to show how a measure changes over a continuous time axis.

Avoid when: you have fewer than 4–5 time points — the line suggests a smooth flow that does not exist. Use a column chart instead.

Add a constant line for targets or averages: Analytics pane → Constant line. Use the secondary y-axis when you put two measures with different scales on the same chart (for example, revenue and transaction count).

Bar / Column Chart — Category Comparison

Use when: you want to compare a measure across separate categories (stations, parking types, payment methods).

  • Column chart: categories on the x-axis — best when category labels are short.
  • Bar chart: categories on the y-axis — best when category labels are long.
  • Clustered: compare sub-categories side by side.
  • Stacked: show breakdown and total at the same time.

Sort bars by value, not alphabetically, unless the category has a natural order (months, age groups).

Avoid when: you have more than 12–15 categories — use a table or filter to top N instead.

Card — Single KPI

Use when: you want to highlight one number prominently (total revenue, visit count, average amount).

Always pair a card with context: add a label that shows the comparison ("vs last month: +12%") or a target. A number with no context is not an insight.

Gauge — Performance vs Target

Use when: you want to show progress toward a specific target (monthly revenue vs budget).

Avoid when: you have several metrics to compare — a bar chart of actuals vs targets is easier to read for more than 2–3 metrics.

Donut / Pie Chart — Composition

Use when: you want to show the share of a total with 5 or fewer categories, and the audience cares about share more than exact values.

Avoid when: you have more than 5 categories, you need to compare values precisely, or you want to show change over time. A stacked bar chart handles all three cases better.

Scatter Chart — Link Between Two Measures

Use when: you want to check whether two numeric measures move together across categories (for example, does station size predict revenue?).

Set the Play axis to a date field to animate the scatter across time — useful for showing how the link changes month by month.

Matrix — Cross-Tabulation

Use when: you want to show a measure broken down by two dimensions at once (station × month, city × payment method).

Turn on conditional formatting (background color or data bars) so patterns are easy to spot — a plain number grid is hard to read.

Map — Geographic Distribution

Use when: location matters (which cities or regions have the most revenue).

Use Filled Map for areas (city, district), and Bubble Map for point locations (single stations). Make sure the location column is set to the right type (Data view → Column tools → Data category → City / Country / etc.).


Common Mistakes

1. Putting Aggregations in Calculated Columns

-- WRONG: calculated columns cannot aggregate across rows
Revenue Column = SUM(fact_parking_transactions[amount])   -- returns an error or the wrong value

-- CORRECT: aggregations belong in measures
Total Revenue = SUM(fact_parking_transactions[amount])

2. Using Bi-directional Relationships by Default

Bi-directional cross-filtering lets filters flow both ways between tables. It is rarely needed in a star schema, and it causes ambiguous filter paths and slower queries. Use single-direction (dimension → fact) unless you have a specific reason.

3. Missing or Misconfigured Date Table

Time intelligence functions (TOTALYTD, SAMEPERIODLASTYEAR, etc.) need a continuous date table marked as a Date Table. Missing dates cause wrong totals. Check that the date table has no gaps and covers the full range of your data.

4. Dividing with / Instead of DIVIDE

-- WRONG: a blank denominator causes an error
Revenue per Visit = [Total Revenue] / [Transaction Count]

-- CORRECT
Revenue per Visit = DIVIDE([Total Revenue], [Transaction Count])

5. Overusing Donut / Pie Charts

More than 5 slices become unreadable. Use a bar chart sorted by value instead — bar length is easier to compare than angle.

6. Forgetting ALL() in Share-of-Total Measures

-- WRONG: the denominator is also filtered, so it always returns 100%
Revenue Share = DIVIDE([Total Revenue], [Total Revenue])

-- CORRECT: remove filters from the denominator
Revenue Share =
DIVIDE(
    [Total Revenue],
    CALCULATE([Total Revenue], ALL(dim_station))
)