SQL Patterns
SQL Patterns are reusable solutions to problems you'll encounter repeatedly — in work and in interviews. Each pattern combines the building blocks from SQL Notes into a complete approach.
All examples use the same parking system tables: parking_history, payment_history, management_table, and charging_records.
Pattern Selection Guide
Use this table to find the right pattern before reading the full example.
| Your problem | Pattern |
|---|---|
| Keep only the best / latest record per group | 1. De-duplication |
| Find the Nth largest or smallest value | 2. Nth Highest Value |
| Compare this period vs. the previous one | 3. Period-over-Period |
| Show cumulative totals or a smoothed trend line | 4. Running Total & Moving Average |
| Measure drop-off between steps in a flow | 5. Funnel Analysis |
| Find consecutive streaks or missing dates | 6. Gap & Island |
| Track how many users return month after month | 7. Cohort Analysis |
| Get the top N records per group (not just top 1) | 8. Top N per Group |
| Find records that exist in A but not in B | 9. Anti Join |
| Compute multiple conditional metrics in one pass | 10. Conditional Aggregation |
When no single pattern fits, see Multi-Pattern Composition at the end of this article.
Pattern 1: De-duplication (Keep One Row Per Group)
Problem: Each license plate has multiple parking sessions. Get only the most recent session per plate.
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY license_plate
ORDER BY entry_time DESC
) AS rn
FROM parking_history
)
SELECT license_plate, station_code, entry_time, exit_time
FROM ranked
WHERE rn = 1;
PARTITION BY license_plate— restart the counter for each car.ORDER BY entry_time DESC— the latest session getsrn = 1.- Filter
WHERE rn = 1— keep only the winner per group.
This pattern answers: "Give me the latest / cheapest / highest record per group."
Swap ORDER BY direction and column to change what "best" means.
Variant: Remove Exact Duplicate Rows
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY license_plate, entry_time, station_code
ORDER BY (SELECT NULL)
) AS rn
FROM parking_history
)
SELECT * FROM ranked WHERE rn = 1;
PARTITION BY all columns that define a duplicate. ORDER BY (SELECT NULL) means "no preference" — just pick one.
Template
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY <group_col> ORDER BY <sort_col> DESC) AS rn
FROM <table>
)
SELECT * FROM ranked WHERE rn = 1;
Pattern 2: Nth Highest Value
Problem: Find the 3rd highest single payment ever made.
Method 1: DENSE_RANK (recommended)
WITH ranked AS (
SELECT amount_received,
DENSE_RANK() OVER (ORDER BY amount_received DESC) AS rnk
FROM payment_history
)
SELECT amount_received
FROM ranked
WHERE rnk = 3;
Method 2: Subquery + LIMIT
SELECT MIN(amount_received) AS third_highest
FROM (
SELECT DISTINCT amount_received
FROM payment_history
ORDER BY amount_received DESC
LIMIT 3
) AS top3;
- Use DENSE_RANK when ties should share the same rank (1, 2, 2, 3 ...).
- Use ROW_NUMBER when you want strictly the Nth row, ignoring ties.
DISTINCTin the subquery ensures tied values count as one position.
Template
WITH ranked AS (
SELECT <value_col>,
DENSE_RANK() OVER (ORDER BY <value_col> DESC) AS rnk
FROM <table>
)
SELECT <value_col> FROM ranked WHERE rnk = <N>;
Pattern 3: Period-over-Period Comparison (MoM / YoY)
Problem: Calculate monthly charging spend and the change vs. the previous month.
WITH monthly AS (
SELECT DATE_FORMAT(paid_time, '%Y-%m') AS month,
SUM(amount_received) AS total_spent
FROM payment_history
GROUP BY DATE_FORMAT(paid_time, '%Y-%m')
),
with_lag AS (
SELECT month,
total_spent,
LAG(total_spent) OVER (ORDER BY month) AS prev_month
FROM monthly
)
SELECT month,
total_spent,
prev_month,
total_spent - prev_month AS mom_change,
ROUND((total_spent - prev_month) / prev_month * 100, 1) AS mom_pct
FROM with_lag
ORDER BY month;
- First CTE: aggregate to the period you want (month, quarter, year).
- Second CTE: use
LAGto pull the previous period's value into the current row. - Final SELECT: subtract and divide — never repeat the
LAG(...)expression in the same SELECT.
For YoY, use LAG(total_spent, 12) to look back 12 rows instead of 1.
Template
WITH periods AS (
SELECT <period_expr> AS period,
<agg_func>(<value_col>) AS metric
FROM <table>
GROUP BY <period_expr>
),
with_lag AS (
SELECT period, metric,
LAG(metric) OVER (ORDER BY period) AS prev_metric
FROM periods
)
SELECT period, metric, prev_metric,
metric - prev_metric AS change,
ROUND((metric - prev_metric) / prev_metric * 100, 1) AS pct_change
FROM with_lag;
Pattern 4: Running Total & Moving Average
Problem: Show cumulative spend and a 3-month moving average alongside monthly figures.
WITH monthly AS (
SELECT DATE_FORMAT(paid_time, '%Y-%m') AS month,
SUM(amount_received) AS total_spent
FROM payment_history
GROUP BY DATE_FORMAT(paid_time, '%Y-%m')
)
SELECT month,
total_spent,
SUM(total_spent) OVER (
ORDER BY month
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
ROUND(AVG(total_spent) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 0) AS moving_avg_3m
FROM monthly
ORDER BY month;
Window Frame Reference
| Frame | Meaning |
|---|---|
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
First row up to now → running total |
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW |
Current + 2 rows back → 3-period moving avg |
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING |
Centered 3-period smoothing |
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
Entire partition |
Without a frame clause, SUM() OVER (ORDER BY ...) defaults to UNBOUNDED PRECEDING AND CURRENT ROW — a running total. Always write it explicitly to be safe.
Template
WITH aggregated AS (
SELECT <period_expr> AS period,
<agg_func>(<value_col>) AS metric
FROM <table>
GROUP BY <period_expr>
)
SELECT period, metric,
SUM(metric) OVER (ORDER BY period
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
AVG(metric) OVER (ORDER BY period
ROWS BETWEEN <N-1> PRECEDING AND CURRENT ROW) AS moving_avg
FROM aggregated;
Pattern 5: Funnel Analysis
Problem: Of all cars that entered, how many completed payment? What's the drop-off rate?
WITH funnel AS (
SELECT
COUNT(DISTINCT ph.license_plate) AS step1_entered,
COUNT(DISTINCT CASE
WHEN pay.parking_id IS NOT NULL
THEN ph.license_plate END) AS step2_paid,
COUNT(DISTINCT CASE
WHEN pay.parking_id IS NOT NULL
AND pay.amount_received > 0
THEN ph.license_plate END) AS step3_paid_nonzero
FROM parking_history ph
LEFT JOIN payment_history pay
ON ph.parking_id = pay.parking_id
)
SELECT step1_entered,
step2_paid,
step3_paid_nonzero,
ROUND(step2_paid / step1_entered * 100, 1) AS pct_to_step2,
ROUND(step3_paid_nonzero / step2_paid * 100, 1) AS pct_to_step3
FROM funnel;
The core pattern is LEFT JOIN + COUNT(DISTINCT CASE WHEN ... IS NOT NULL THEN id END).
- LEFT JOIN keeps all users from the first step, even if they never reached the next.
- CASE WHEN IS NOT NULL counts only those who did reach the next step.
Each column represents one funnel stage. Divide adjacent stages for the conversion rate.
Template
WITH funnel AS (
SELECT
COUNT(DISTINCT <user_id>) AS step1,
COUNT(DISTINCT CASE WHEN <step2_condition> THEN <user_id> END) AS step2,
COUNT(DISTINCT CASE WHEN <step3_condition> THEN <user_id> END) AS step3
FROM <base_table>
LEFT JOIN <next_table> ON <join_condition>
)
SELECT step1, step2, step3,
ROUND(step2 / step1 * 100, 1) AS pct_step2,
ROUND(step3 / step2 * 100, 1) AS pct_step3
FROM funnel;
Pattern 6: Gap & Island
Finding Islands (Consecutive Streaks)
Problem: Find each station's consecutive-day activity streaks and how long each lasted.
WITH daily AS (
SELECT station_code,
DATE(entry_time) AS activity_date
FROM parking_history
GROUP BY station_code, DATE(entry_time)
),
grouped AS (
SELECT station_code,
activity_date,
DATE_SUB(
activity_date,
INTERVAL ROW_NUMBER() OVER (
PARTITION BY station_code
ORDER BY activity_date
) DAY
) AS grp
FROM daily
)
SELECT station_code,
MIN(activity_date) AS streak_start,
MAX(activity_date) AS streak_end,
COUNT(*) AS streak_days
FROM grouped
GROUP BY station_code, grp
ORDER BY station_code, streak_start;
The island trick: subtract the row number from the date. Consecutive dates decrease by 1 each row while the row number also increases by 1 — so their difference stays constant. Non-consecutive dates produce a different constant, forming a new group.
Finding Gaps (Missing Days)
Problem: Find dates in 2025 where no payments were made.
WITH RECURSIVE date_series AS (
SELECT DATE('2025-01-01') AS dt
UNION ALL
SELECT DATE_ADD(dt, INTERVAL 1 DAY)
FROM date_series
WHERE dt < '2025-12-31'
),
active_days AS (
SELECT DISTINCT DATE(paid_time) AS dt
FROM payment_history
WHERE paid_time BETWEEN '2025-01-01' AND '2025-12-31'
)
SELECT ds.dt AS gap_date
FROM date_series ds
LEFT JOIN active_days ad ON ds.dt = ad.dt
WHERE ad.dt IS NULL
ORDER BY ds.dt;
WITH RECURSIVE generates a complete date series. LEFT JOIN + WHERE IS NULL then reveals which dates had no activity. This is the anti-join pattern applied to time.
Template
-- Islands (consecutive streaks)
WITH daily AS (
SELECT <group_col>, DATE(<date_col>) AS dt
FROM <table> GROUP BY <group_col>, DATE(<date_col>)
),
grouped AS (
SELECT <group_col>, dt,
DATE_SUB(dt, INTERVAL ROW_NUMBER() OVER (
PARTITION BY <group_col> ORDER BY dt) DAY) AS grp
FROM daily
)
SELECT <group_col>, MIN(dt) AS start, MAX(dt) AS end, COUNT(*) AS length
FROM grouped GROUP BY <group_col>, grp;
-- Gaps (missing dates)
WITH RECURSIVE dates AS (
SELECT DATE('<start>') AS dt
UNION ALL
SELECT DATE_ADD(dt, INTERVAL 1 DAY) FROM dates WHERE dt < '<end>'
),
active AS (SELECT DISTINCT DATE(<date_col>) AS dt FROM <table>)
SELECT d.dt FROM dates d LEFT JOIN active a ON d.dt = a.dt WHERE a.dt IS NULL;
Pattern 7: Cohort Analysis
Problem: Group cars by the month they first parked. Track how many returned in each subsequent month.
WITH first_visit AS (
SELECT license_plate,
DATE_FORMAT(MIN(entry_time), '%Y-%m') AS cohort_month
FROM parking_history
GROUP BY license_plate
),
activity AS (
SELECT ph.license_plate,
fv.cohort_month,
TIMESTAMPDIFF(
MONTH,
STR_TO_DATE(CONCAT(fv.cohort_month, '-01'), '%Y-%m-%d'),
DATE(ph.entry_time)
) AS month_offset
FROM parking_history ph
JOIN first_visit fv ON ph.license_plate = fv.license_plate
)
SELECT cohort_month,
COUNT(DISTINCT CASE WHEN month_offset = 0 THEN license_plate END) AS m0,
COUNT(DISTINCT CASE WHEN month_offset = 1 THEN license_plate END) AS m1,
COUNT(DISTINCT CASE WHEN month_offset = 2 THEN license_plate END) AS m2,
COUNT(DISTINCT CASE WHEN month_offset = 3 THEN license_plate END) AS m3,
COUNT(DISTINCT CASE WHEN month_offset = 4 THEN license_plate END) AS m4
FROM activity
GROUP BY cohort_month
ORDER BY cohort_month;
Steps:
1. first_visit — find each user's cohort (the month they first appeared).
2. activity — join back to all sessions; TIMESTAMPDIFF turns each visit date into "months since cohort".
3. Final SELECT — pivot month_offset into columns using the CASE WHEN pivot pattern from SQL Notes.
m0 is always the cohort size. m1 / m0 is the Month-1 retention rate.
Template
WITH first_event AS (
SELECT <user_id>,
DATE_FORMAT(MIN(<event_date>), '%Y-%m') AS cohort_month
FROM <table> GROUP BY <user_id>
),
activity AS (
SELECT t.<user_id>, fe.cohort_month,
TIMESTAMPDIFF(MONTH,
STR_TO_DATE(CONCAT(fe.cohort_month, '-01'), '%Y-%m-%d'),
DATE(t.<event_date>)) AS month_offset
FROM <table> t JOIN first_event fe ON t.<user_id> = fe.<user_id>
)
SELECT cohort_month,
COUNT(DISTINCT CASE WHEN month_offset = 0 THEN <user_id> END) AS m0,
COUNT(DISTINCT CASE WHEN month_offset = 1 THEN <user_id> END) AS m1,
COUNT(DISTINCT CASE WHEN month_offset = 2 THEN <user_id> END) AS m2
FROM activity GROUP BY cohort_month ORDER BY cohort_month;
Pattern 8: Top N per Group
Problem: Get the top 3 most recent parking sessions for each station.
WITH ranked AS (
SELECT station_code, license_plate, entry_time,
ROW_NUMBER() OVER (
PARTITION BY station_code
ORDER BY entry_time DESC
) AS rn
FROM parking_history
)
SELECT station_code, license_plate, entry_time
FROM ranked
WHERE rn <= 3;
Change <= 3 to <= N for any N. The difference from de-duplication is rn = 1 vs rn <= N.
Handling Ties
If ties should all be included (e.g. 3rd place has two records — show both):
WITH ranked AS (
SELECT station_code, license_plate, amount_received,
DENSE_RANK() OVER (
PARTITION BY station_code
ORDER BY amount_received DESC
) AS rnk
FROM payment_history
)
SELECT station_code, license_plate, amount_received
FROM ranked
WHERE rnk <= 3;
ROW_NUMBER: strict top N, ties broken arbitrarily.DENSE_RANK: top N distinct values, tied rows all included.RANK: top N rows by rank number — may return more than N rows if there are ties before position N.
Template
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY <group_col> ORDER BY <sort_col> DESC) AS rn
FROM <table>
)
SELECT * FROM ranked WHERE rn <= <N>;
Pattern 9: Anti Join (Records with No Match)
Problem: Find all parking sessions that have no corresponding payment record.
Three equivalent approaches — each with different trade-offs:
Method 1: LEFT JOIN + IS NULL (most common)
SELECT ph.license_plate, ph.station_code, ph.entry_time
FROM parking_history ph
LEFT JOIN payment_history pay
ON ph.parking_id = pay.parking_id
WHERE pay.parking_id IS NULL;
Keep all rows from the left table, then filter to those where the join produced no match. Fast and universally supported.
Method 2: NOT EXISTS (most readable for complex conditions)
SELECT ph.license_plate, ph.station_code, ph.entry_time
FROM parking_history ph
WHERE NOT EXISTS (
SELECT 1
FROM payment_history pay
WHERE pay.parking_id = ph.parking_id
);
Stops searching as soon as one match is found — can be faster when matches are common. SELECT 1 is conventional; only existence matters.
Method 3: NOT IN (use with caution)
SELECT license_plate, station_code, entry_time
FROM parking_history
WHERE parking_id NOT IN (
SELECT parking_id FROM payment_history
);
Warning: if the subquery returns any NULL, NOT IN returns no rows at all — a silent bug. Only use NOT IN when you can guarantee the subquery column has no NULLs.
When to use which
| Method | Best when |
|---|---|
| LEFT JOIN + IS NULL | Simple join condition, general-purpose |
| NOT EXISTS | Multi-column or complex match conditions |
| NOT IN | Subquery column is guaranteed NOT NULL |
Template
-- LEFT JOIN + IS NULL
SELECT a.* FROM <table_a> a
LEFT JOIN <table_b> b ON a.<key> = b.<key>
WHERE b.<key> IS NULL;
-- NOT EXISTS
SELECT * FROM <table_a> a
WHERE NOT EXISTS (
SELECT 1 FROM <table_b> b WHERE b.<key> = a.<key>
);
Pattern 10: Conditional Aggregation
Problem: Calculate multiple metrics sliced by a condition in a single query pass — without subqueries.
Count / Sum by Condition
SELECT station_code,
COUNT(*) AS total_sessions,
COUNT(CASE WHEN parking_type = 'hourly' THEN 1 END) AS hourly_count,
COUNT(CASE WHEN parking_type = 'monthly' THEN 1 END) AS monthly_count,
SUM(CASE WHEN parking_type = 'hourly' THEN 1 ELSE 0 END) AS hourly_sum,
ROUND(
COUNT(CASE WHEN parking_type = 'monthly' THEN 1 END)
/ COUNT(*) * 100, 1
) AS monthly_pct
FROM parking_history
GROUP BY station_code;
COUNT(CASE WHEN ... THEN 1 END) — omitting ELSE means non-matching rows return NULL, which COUNT ignores. Equivalent to SUM(CASE WHEN ... THEN 1 ELSE 0 END) but slightly cleaner.
Selective Average (ignore a subset)
SELECT station_code,
AVG(amount_received) AS avg_all,
AVG(CASE WHEN payment_method = 'Credit_Card'
THEN amount_received END) AS avg_credit_card,
AVG(CASE WHEN payment_method <> 'Credit_Card'
THEN amount_received END) AS avg_non_credit
FROM payment_history
GROUP BY station_code;
AVG ignores NULL, so wrapping values in CASE WHEN without an ELSE naturally computes the average over only the matching rows.
Compare Two Time Windows in One Query
SELECT station_code,
SUM(CASE WHEN DATE(paid_time) BETWEEN '2025-01-01' AND '2025-06-30'
THEN amount_received ELSE 0 END) AS h1_revenue,
SUM(CASE WHEN DATE(paid_time) BETWEEN '2025-07-01' AND '2025-12-31'
THEN amount_received ELSE 0 END) AS h2_revenue,
SUM(CASE WHEN DATE(paid_time) BETWEEN '2025-07-01' AND '2025-12-31'
THEN amount_received ELSE 0 END)
- SUM(CASE WHEN DATE(paid_time) BETWEEN '2025-01-01' AND '2025-06-30'
THEN amount_received ELSE 0 END) AS growth
FROM payment_history
WHERE YEAR(paid_time) = 2025
GROUP BY station_code;
One table scan, two time windows, one comparison — no self-join or subquery needed. This is the most common use of conditional aggregation in business reporting.
Template
SELECT <group_col>,
COUNT(*) AS total,
COUNT(CASE WHEN <condition> THEN 1 END) AS count_if,
SUM(CASE WHEN <condition> THEN <value_col> ELSE 0 END) AS sum_if,
AVG(CASE WHEN <condition> THEN <value_col> END) AS avg_if,
ROUND(COUNT(CASE WHEN <condition> THEN 1 END) / COUNT(*) * 100, 1) AS pct
FROM <table>
GROUP BY <group_col>;
Common Mistakes
1. NOT IN Returns Nothing When the Subquery Contains NULL
-- BROKEN: if any parking_id in payment_history is NULL, this returns zero rows
SELECT * FROM parking_history
WHERE parking_id NOT IN (SELECT parking_id FROM payment_history);
-- SAFE: filter NULLs out, or use NOT EXISTS instead
SELECT * FROM parking_history
WHERE parking_id NOT IN (
SELECT parking_id FROM payment_history WHERE parking_id IS NOT NULL
);
NOT IN (... NULL ...) always evaluates to UNKNOWN, so every row is filtered out silently.
2. COUNT(*) vs COUNT(DISTINCT id) — Counting Rows vs Unique Users
-- Counts every row, including duplicates (same car entering multiple times)
SELECT station_code, COUNT(*) AS visits
FROM parking_history GROUP BY station_code;
-- Counts unique cars per station
SELECT station_code, COUNT(DISTINCT license_plate) AS unique_cars
FROM parking_history GROUP BY station_code;
Ask yourself: "Am I counting events or unique entities?" The answer determines which to use.
3. Using WHERE with Aggregate Functions
-- BROKEN: WHERE runs before aggregation, so COUNT(*) doesn't exist yet
SELECT station_code, COUNT(*) AS total
FROM parking_history
WHERE COUNT(*) > 100 -- ❌ error
GROUP BY station_code;
-- CORRECT: HAVING filters after aggregation
SELECT station_code, COUNT(*) AS total
FROM parking_history
GROUP BY station_code
HAVING COUNT(*) > 100; -- ✓
4. Using a SELECT Alias in WHERE (Execution Order Trap)
-- BROKEN: 'revenue' alias doesn't exist at the WHERE stage
SELECT station_code, SUM(amount_received) AS revenue
FROM payment_history
WHERE revenue > 10000 -- ❌ unknown column
GROUP BY station_code;
-- CORRECT: repeat the expression, or use HAVING
SELECT station_code, SUM(amount_received) AS revenue
FROM payment_history
GROUP BY station_code
HAVING SUM(amount_received) > 10000;
Refer to SQL Execution Order: WHERE (step 2) runs before SELECT (step 5), so aliases defined in SELECT don't exist yet.
5. ROW_NUMBER Doesn't Handle Ties the Way You Expect
-- If two rows share the same entry_time, ROW_NUMBER picks one arbitrarily
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY station_code ORDER BY entry_time DESC) AS rn
FROM parking_history
)
SELECT * FROM ranked WHERE rn = 1;
-- One tied row is silently dropped — use RANK() or DENSE_RANK() if ties matter
6. Division by Zero in Percentage Calculations
-- BROKEN: crashes if total_visits is 0
SELECT station_code, paid_count / total_visits * 100 AS pct_paid
FROM station_summary;
-- SAFE: wrap denominator with NULLIF
SELECT station_code,
ROUND(paid_count / NULLIF(total_visits, 0) * 100, 1) AS pct_paid
FROM station_summary;
7. LAG Crossing a Partition Boundary
-- LAG looks back across stations when there is no PARTITION BY
SELECT station_code, month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev -- ❌ bleeds across stations
FROM monthly_stats;
-- CORRECT: partition by the group you care about
SELECT station_code, month, revenue,
LAG(revenue) OVER (PARTITION BY station_code ORDER BY month) AS prev
FROM monthly_stats;
8. BETWEEN Is Inclusive on Both Ends — Risky with Timestamps
-- This misses records on Dec 31 after 00:00:00
WHERE paid_time BETWEEN '2025-01-01' AND '2025-12-31'
-- Safer: use >= and < with the next day/year
WHERE paid_time >= '2025-01-01' AND paid_time < '2026-01-01'
Multi-Pattern Composition
Real problems rarely fit one pattern. This example combines four patterns into a single monthly station report.
Problem: For each station in 2025, show monthly revenue, rank within its city, MoM growth, and a top-performer flag (top 25% by revenue that month).
Patterns used: Conditional Aggregation → Period-over-Period → Top N / Ranking → Conditional Aggregation (for the tier flag).
-- Step 1 (Pattern 10): aggregate monthly revenue per station
WITH monthly_revenue AS (
SELECT mt.station_code,
mt.station_name,
mt.city,
DATE_FORMAT(pay.paid_time, '%Y-%m') AS month,
SUM(pay.amount_received) AS revenue
FROM payment_history pay
JOIN management_table mt ON pay.station_code = mt.station_code
WHERE YEAR(pay.paid_time) = 2025
GROUP BY mt.station_code, mt.station_name, mt.city,
DATE_FORMAT(pay.paid_time, '%Y-%m')
),
-- Step 2 (Pattern 3): pull in the previous month's revenue via LAG
with_growth AS (
SELECT *,
LAG(revenue) OVER (
PARTITION BY station_code ORDER BY month
) AS prev_revenue
FROM monthly_revenue
),
-- Step 3 (Pattern 8): rank each station within its city per month,
-- and assign revenue quartile across all stations
with_rank AS (
SELECT *,
RANK() OVER (
PARTITION BY city, month ORDER BY revenue DESC
) AS city_rank,
NTILE(4) OVER (
PARTITION BY month ORDER BY revenue DESC
) AS quartile
FROM with_growth
)
-- Final SELECT: combine all metrics, label top performers
SELECT station_name,
city,
month,
revenue,
ROUND((revenue - prev_revenue) / NULLIF(prev_revenue, 0) * 100, 1) AS mom_pct,
city_rank,
CASE WHEN quartile = 1 THEN 'Top 25%' ELSE NULL END AS tier
FROM with_rank
ORDER BY month, city, city_rank;
Reading the composition:
- Each CTE does exactly one job — name it after what it produces, not how it works.
- Patterns that need the same base data share the first CTE rather than hitting the table twice.
- Window functions in different CTEs don't interfere with each other — layer them one at a time for clarity.
- NULLIF(prev_revenue, 0) guards the MoM division (Pattern mistake #6).