SQL Patterns

2026-04-24 Data Analysis 7 min read

SQL

SQL Patterns are reusable solutions to problems you will see again and again — at work and in interviews. Each pattern combines the building blocks from SQL Notes into one full approach. For an end-to-end example that uses several patterns at once, see RFM Analysis Notebook.

All examples use the same parking system tables: parking_history, payment_history, management_table, and charging_records.

Pattern Selection Guide

Use this table to find the right pattern before you read the full example.

Your problem	Pattern
Keep only the best / latest record per group	1. De-duplication
Find the Nth largest or smallest value	2. Nth Highest Value
Compare this period vs the one before	3. Period-over-Period
Show running totals or a smoothed trend line	4. Running Total & Moving Average
Measure drop-off between steps in a flow	5. Funnel Analysis
Find consecutive streaks or missing dates	6. Gap & Island
Track how many users come back month after month	7. Cohort Analysis
Get the top N records per group (not just top 1)	8. Top N per Group
Find records that exist in A but not in B	9. Anti Join
Compute several conditional metrics in one pass	10. Conditional Aggregation

When no single pattern fits, see Multi-Pattern Composition at the end.

Pattern 1: De-duplication (Keep One Row Per Group)

Problem: Each license plate has many parking sessions. Get only the most recent session per plate.

WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY license_plate
               ORDER BY entry_time DESC
           ) AS rn
    FROM parking_history
)
SELECT license_plate, station_code, entry_time, exit_time
FROM ranked
WHERE rn = 1;

PARTITION BY license_plate — restart the counter for each car.
ORDER BY entry_time DESC — the latest session gets rn = 1.
Filter WHERE rn = 1 — keep only the winner per group.

This pattern answers: "Give me the latest / cheapest / highest record per group." Swap the ORDER BY direction and column to change what "best" means.

Variant: Remove Exact Duplicate Rows

WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (
               PARTITION BY license_plate, entry_time, station_code
               ORDER BY (SELECT NULL)
           ) AS rn
    FROM parking_history
)
SELECT * FROM ranked WHERE rn = 1;

PARTITION BY every column that defines a duplicate. ORDER BY (SELECT NULL) means "no preference" — just pick one.

Template

WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY <group_col> ORDER BY <sort_col> DESC) AS rn
    FROM <table>
)
SELECT * FROM ranked WHERE rn = 1;

Pattern 2: Nth Highest Value

Problem: Find the 3rd highest single payment ever made.

Method 1: DENSE_RANK (recommended)

WITH ranked AS (
    SELECT amount_received,
           DENSE_RANK() OVER (ORDER BY amount_received DESC) AS rnk
    FROM payment_history
)
SELECT amount_received
FROM ranked
WHERE rnk = 3;

Method 2: Subquery + LIMIT

SELECT MIN(amount_received) AS third_highest
FROM (
    SELECT DISTINCT amount_received
    FROM payment_history
    ORDER BY amount_received DESC
    LIMIT 3
) AS top3;

Use DENSE_RANK when ties should share the same rank (1, 2, 2, 3 ...).
Use ROW_NUMBER when you want strictly the Nth row, ignoring ties.
DISTINCT in the subquery makes sure tied values count as one position.

Template

WITH ranked AS (
    SELECT <value_col>,
           DENSE_RANK() OVER (ORDER BY <value_col> DESC) AS rnk
    FROM <table>
)
SELECT <value_col> FROM ranked WHERE rnk = <N>;

Pattern 3: Period-over-Period Comparison (MoM / YoY)

Problem: Calculate monthly charging spend and the change vs the previous month.

WITH monthly AS (
    SELECT DATE_FORMAT(paid_time, '%Y-%m') AS month,
           SUM(amount_received)            AS total_spent
    FROM payment_history
    GROUP BY DATE_FORMAT(paid_time, '%Y-%m')
),
with_lag AS (
    SELECT month,
           total_spent,
           LAG(total_spent) OVER (ORDER BY month) AS prev_month
    FROM monthly
)
SELECT month,
       total_spent,
       prev_month,
       total_spent - prev_month                               AS mom_change,
       ROUND((total_spent - prev_month) / prev_month * 100, 1) AS mom_pct
FROM with_lag
ORDER BY month;

First CTE: aggregate to the period you want (month, quarter, year).
Second CTE: use LAG to pull the previous period's value into the current row.
Final SELECT: subtract and divide — never repeat the LAG(...) expression in the same SELECT.

For YoY, use LAG(total_spent, 12) to look back 12 rows instead of 1.

Template

WITH periods AS (
    SELECT <period_expr>           AS period,
           <agg_func>(<value_col>) AS metric
    FROM <table>
    GROUP BY <period_expr>
),
with_lag AS (
    SELECT period, metric,
           LAG(metric) OVER (ORDER BY period) AS prev_metric
    FROM periods
)
SELECT period, metric, prev_metric,
       metric - prev_metric                               AS change,
       ROUND((metric - prev_metric) / prev_metric * 100, 1) AS pct_change
FROM with_lag;

Pattern 4: Running Total & Moving Average

Problem: Show running spend and a 3-month moving average next to the monthly numbers.

WITH monthly AS (
    SELECT DATE_FORMAT(paid_time, '%Y-%m') AS month,
           SUM(amount_received)            AS total_spent
    FROM payment_history
    GROUP BY DATE_FORMAT(paid_time, '%Y-%m')
)
SELECT month,
       total_spent,
       SUM(total_spent) OVER (
           ORDER BY month
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       )                                   AS running_total,
       ROUND(AVG(total_spent) OVER (
           ORDER BY month
           ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
       ), 0)                               AS moving_avg_3m
FROM monthly
ORDER BY month;

Window Frame Reference

Frame	Meaning
`ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`	First row up to now → running total
`ROWS BETWEEN 2 PRECEDING AND CURRENT ROW`	Current + 2 rows back → 3-period moving avg
`ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING`	Centered 3-period smoothing
`ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`	The whole partition

Without a frame clause, SUM() OVER (ORDER BY ...) defaults to UNBOUNDED PRECEDING AND CURRENT ROW — a running total. Always write the frame on purpose to be safe.

Template

WITH aggregated AS (
    SELECT <period_expr>           AS period,
           <agg_func>(<value_col>) AS metric
    FROM <table>
    GROUP BY <period_expr>
)
SELECT period, metric,
       SUM(metric) OVER (ORDER BY period
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
       AVG(metric) OVER (ORDER BY period
           ROWS BETWEEN <N-1> PRECEDING AND CURRENT ROW)     AS moving_avg
FROM aggregated;

Pattern 5: Funnel Analysis

Problem: Of all cars that entered, how many paid? What is the drop-off rate?

WITH funnel AS (
    SELECT
        COUNT(DISTINCT ph.license_plate) AS step1_entered,
        COUNT(DISTINCT CASE
            WHEN pay.parking_id IS NOT NULL
            THEN ph.license_plate END)   AS step2_paid,
        COUNT(DISTINCT CASE
            WHEN pay.parking_id IS NOT NULL
             AND pay.amount_received > 0
            THEN ph.license_plate END)   AS step3_paid_nonzero
    FROM parking_history ph
    LEFT JOIN payment_history pay
      ON ph.parking_id = pay.parking_id
)
SELECT step1_entered,
       step2_paid,
       step3_paid_nonzero,
       ROUND(step2_paid         / step1_entered * 100, 1) AS pct_to_step2,
       ROUND(step3_paid_nonzero / step2_paid    * 100, 1) AS pct_to_step3
FROM funnel;

The core pattern is LEFT JOIN + COUNT(DISTINCT CASE WHEN ... IS NOT NULL THEN id END). - LEFT JOIN keeps every user from the first step, even if they never reached the next one. - CASE WHEN IS NOT NULL counts only those who did reach the next step.

Each column is one funnel stage. Divide neighboring stages to get the conversion rate.

Template

WITH funnel AS (
    SELECT
        COUNT(DISTINCT <user_id>)                                          AS step1,
        COUNT(DISTINCT CASE WHEN <step2_condition> THEN <user_id> END)    AS step2,
        COUNT(DISTINCT CASE WHEN <step3_condition> THEN <user_id> END)    AS step3
    FROM <base_table>
    LEFT JOIN <next_table> ON <join_condition>
)
SELECT step1, step2, step3,
       ROUND(step2 / step1 * 100, 1) AS pct_step2,
       ROUND(step3 / step2 * 100, 1) AS pct_step3
FROM funnel;

Pattern 6: Gap & Island

Finding Islands (Consecutive Streaks)

Problem: Find each station's consecutive-day activity streaks and how long each one lasted.

WITH daily AS (
    SELECT station_code,
           DATE(entry_time) AS activity_date
    FROM parking_history
    GROUP BY station_code, DATE(entry_time)
),
grouped AS (
    SELECT station_code,
           activity_date,
           DATE_SUB(
               activity_date,
               INTERVAL ROW_NUMBER() OVER (
                   PARTITION BY station_code
                   ORDER BY activity_date
               ) DAY
           ) AS grp
    FROM daily
)
SELECT station_code,
       MIN(activity_date) AS streak_start,
       MAX(activity_date) AS streak_end,
       COUNT(*)           AS streak_days
FROM grouped
GROUP BY station_code, grp
ORDER BY station_code, streak_start;

The island trick: subtract the row number from the date. Consecutive dates go down by 1 each row while the row number goes up by 1, so their difference stays the same. Non-consecutive dates produce a different difference, which makes a new group.

Finding Gaps (Missing Days)

Problem: Find dates in 2025 where no payments happened.

WITH RECURSIVE date_series AS (
    SELECT DATE('2025-01-01') AS dt
    UNION ALL
    SELECT DATE_ADD(dt, INTERVAL 1 DAY)
    FROM date_series
    WHERE dt < '2025-12-31'
),
active_days AS (
    SELECT DISTINCT DATE(paid_time) AS dt
    FROM payment_history
    WHERE paid_time BETWEEN '2025-01-01' AND '2025-12-31'
)
SELECT ds.dt AS gap_date
FROM date_series ds
LEFT JOIN active_days ad ON ds.dt = ad.dt
WHERE ad.dt IS NULL
ORDER BY ds.dt;

WITH RECURSIVE builds the full date series. LEFT JOIN + WHERE IS NULL then shows the dates with no activity. This is the anti-join pattern applied to time.

Template

-- Islands (consecutive streaks)
WITH daily AS (
    SELECT <group_col>, DATE(<date_col>) AS dt
    FROM <table> GROUP BY <group_col>, DATE(<date_col>)
),
grouped AS (
    SELECT <group_col>, dt,
           DATE_SUB(dt, INTERVAL ROW_NUMBER() OVER (
               PARTITION BY <group_col> ORDER BY dt) DAY) AS grp
    FROM daily
)
SELECT <group_col>, MIN(dt) AS start, MAX(dt) AS end, COUNT(*) AS length
FROM grouped GROUP BY <group_col>, grp;

-- Gaps (missing dates)
WITH RECURSIVE dates AS (
    SELECT DATE('<start>') AS dt
    UNION ALL
    SELECT DATE_ADD(dt, INTERVAL 1 DAY) FROM dates WHERE dt < '<end>'
),
active AS (SELECT DISTINCT DATE(<date_col>) AS dt FROM <table>)
SELECT d.dt FROM dates d LEFT JOIN active a ON d.dt = a.dt WHERE a.dt IS NULL;

Pattern 7: Cohort Analysis

Problem: Group cars by the month they first parked. Track how many came back in each later month.

WITH first_visit AS (
    SELECT license_plate,
           DATE_FORMAT(MIN(entry_time), '%Y-%m') AS cohort_month
    FROM parking_history
    GROUP BY license_plate
),
activity AS (
    SELECT ph.license_plate,
           fv.cohort_month,
           TIMESTAMPDIFF(
               MONTH,
               STR_TO_DATE(CONCAT(fv.cohort_month, '-01'), '%Y-%m-%d'),
               DATE(ph.entry_time)
           ) AS month_offset
    FROM parking_history ph
    JOIN first_visit fv ON ph.license_plate = fv.license_plate
)
SELECT cohort_month,
       COUNT(DISTINCT CASE WHEN month_offset = 0 THEN license_plate END) AS m0,
       COUNT(DISTINCT CASE WHEN month_offset = 1 THEN license_plate END) AS m1,
       COUNT(DISTINCT CASE WHEN month_offset = 2 THEN license_plate END) AS m2,
       COUNT(DISTINCT CASE WHEN month_offset = 3 THEN license_plate END) AS m3,
       COUNT(DISTINCT CASE WHEN month_offset = 4 THEN license_plate END) AS m4
FROM activity
GROUP BY cohort_month
ORDER BY cohort_month;

Steps: 1. first_visit — find each user's cohort (the month they first showed up). 2. activity — join back to all sessions; TIMESTAMPDIFF turns each visit date into "months since the cohort". 3. Final SELECT — pivot month_offset into columns using the CASE WHEN pivot pattern from SQL Notes.

m0 is always the cohort size. m1 / m0 is the Month-1 retention rate.

Template

WITH first_event AS (
    SELECT <user_id>,
           DATE_FORMAT(MIN(<event_date>), '%Y-%m') AS cohort_month
    FROM <table> GROUP BY <user_id>
),
activity AS (
    SELECT t.<user_id>, fe.cohort_month,
           TIMESTAMPDIFF(MONTH,
               STR_TO_DATE(CONCAT(fe.cohort_month, '-01'), '%Y-%m-%d'),
               DATE(t.<event_date>)) AS month_offset
    FROM <table> t JOIN first_event fe ON t.<user_id> = fe.<user_id>
)
SELECT cohort_month,
       COUNT(DISTINCT CASE WHEN month_offset = 0 THEN <user_id> END) AS m0,
       COUNT(DISTINCT CASE WHEN month_offset = 1 THEN <user_id> END) AS m1,
       COUNT(DISTINCT CASE WHEN month_offset = 2 THEN <user_id> END) AS m2
FROM activity GROUP BY cohort_month ORDER BY cohort_month;

Pattern 8: Top N per Group

Problem: Get the top 3 most recent parking sessions for each station.

WITH ranked AS (
    SELECT station_code, license_plate, entry_time,
           ROW_NUMBER() OVER (
               PARTITION BY station_code
               ORDER BY entry_time DESC
           ) AS rn
    FROM parking_history
)
SELECT station_code, license_plate, entry_time
FROM ranked
WHERE rn <= 3;

Change <= 3 to <= N for any N. The only difference from de-duplication is rn = 1 vs rn <= N.

Handling Ties

If you want every tie included (for example, 3rd place has two records — show both):

WITH ranked AS (
    SELECT station_code, license_plate, amount_received,
           DENSE_RANK() OVER (
               PARTITION BY station_code
               ORDER BY amount_received DESC
           ) AS rnk
    FROM payment_history
)
SELECT station_code, license_plate, amount_received
FROM ranked
WHERE rnk <= 3;

ROW_NUMBER: strict top N, ties broken at random.
DENSE_RANK: top N distinct values, every tied row included.
RANK: top N rows by rank number — may return more than N rows if there are ties before position N.

Template

WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY <group_col> ORDER BY <sort_col> DESC) AS rn
    FROM <table>
)
SELECT * FROM ranked WHERE rn <= <N>;

Pattern 9: Anti Join (Records with No Match)

Problem: Find every parking session that has no matching payment record.

Three approaches that do the same thing — each with different trade-offs:

Method 1: LEFT JOIN + IS NULL (most common)

SELECT ph.license_plate, ph.station_code, ph.entry_time
FROM parking_history ph
LEFT JOIN payment_history pay
  ON ph.parking_id = pay.parking_id
WHERE pay.parking_id IS NULL;

Keep every row from the left table, then filter to those where the join found nothing. Fast and supported by every database.

Method 2: NOT EXISTS (most readable for complex conditions)

SELECT ph.license_plate, ph.station_code, ph.entry_time
FROM parking_history ph
WHERE NOT EXISTS (
    SELECT 1
    FROM payment_history pay
    WHERE pay.parking_id = ph.parking_id
);

Stops searching as soon as one match is found — can be faster when matches are common. SELECT 1 is the convention; only "does it exist" matters.

Method 3: NOT IN (use with caution)

SELECT license_plate, station_code, entry_time
FROM parking_history
WHERE parking_id NOT IN (
    SELECT parking_id FROM payment_history
);

Warning: if the subquery returns any NULL, NOT IN returns no rows at all — a silent bug. Only use NOT IN when you can promise the subquery column has no NULLs.

When to use which

Method	Best when
LEFT JOIN + IS NULL	Simple join condition, general purpose
NOT EXISTS	Multi-column or complex match conditions
NOT IN	The subquery column is guaranteed NOT NULL

Template

-- LEFT JOIN + IS NULL
SELECT a.* FROM <table_a> a
LEFT JOIN <table_b> b ON a.<key> = b.<key>
WHERE b.<key> IS NULL;

-- NOT EXISTS
SELECT * FROM <table_a> a
WHERE NOT EXISTS (
    SELECT 1 FROM <table_b> b WHERE b.<key> = a.<key>
);

Pattern 10: Conditional Aggregation

Problem: Calculate several metrics split by a condition in one query pass — without subqueries.

Count / Sum by Condition

SELECT station_code,
       COUNT(*)                                                        AS total_sessions,
       COUNT(CASE WHEN parking_type = 'hourly'   THEN 1 END)          AS hourly_count,
       COUNT(CASE WHEN parking_type = 'monthly'  THEN 1 END)          AS monthly_count,
       SUM(CASE WHEN parking_type = 'hourly'     THEN 1 ELSE 0 END)   AS hourly_sum,
       ROUND(
           COUNT(CASE WHEN parking_type = 'monthly' THEN 1 END)
           / COUNT(*) * 100, 1
       )                                                               AS monthly_pct
FROM parking_history
GROUP BY station_code;

COUNT(CASE WHEN ... THEN 1 END) — leaving out ELSE means non-matching rows return NULL, which COUNT ignores. Same effect as SUM(CASE WHEN ... THEN 1 ELSE 0 END) but slightly cleaner.

Selective Average (ignore a subset)

SELECT station_code,
       AVG(amount_received)                                            AS avg_all,
       AVG(CASE WHEN payment_method = 'Credit_Card'
               THEN amount_received END)                               AS avg_credit_card,
       AVG(CASE WHEN payment_method <> 'Credit_Card'
               THEN amount_received END)                               AS avg_non_credit
FROM payment_history
GROUP BY station_code;

AVG ignores NULL, so wrapping values in CASE WHEN without an ELSE gives you the average over only the matching rows.

Compare Two Time Windows in One Query

SELECT station_code,
       SUM(CASE WHEN DATE(paid_time) BETWEEN '2025-01-01' AND '2025-06-30'
               THEN amount_received ELSE 0 END) AS h1_revenue,
       SUM(CASE WHEN DATE(paid_time) BETWEEN '2025-07-01' AND '2025-12-31'
               THEN amount_received ELSE 0 END) AS h2_revenue,
       SUM(CASE WHEN DATE(paid_time) BETWEEN '2025-07-01' AND '2025-12-31'
               THEN amount_received ELSE 0 END)
       - SUM(CASE WHEN DATE(paid_time) BETWEEN '2025-01-01' AND '2025-06-30'
               THEN amount_received ELSE 0 END)                        AS growth
FROM payment_history
WHERE YEAR(paid_time) = 2025
GROUP BY station_code;

One table scan, two time windows, one comparison — no self-join or subquery needed. This is the most common use of conditional aggregation in business reporting.

Template

SELECT <group_col>,
       COUNT(*)                                                             AS total,
       COUNT(CASE WHEN <condition>        THEN 1 END)                      AS count_if,
       SUM(CASE WHEN <condition>          THEN <value_col> ELSE 0 END)     AS sum_if,
       AVG(CASE WHEN <condition>          THEN <value_col> END)            AS avg_if,
       ROUND(COUNT(CASE WHEN <condition>  THEN 1 END) / COUNT(*) * 100, 1) AS pct
FROM <table>
GROUP BY <group_col>;

Common Mistakes

1. NOT IN Returns Nothing When the Subquery Contains NULL

-- BROKEN: if any parking_id in payment_history is NULL, this returns zero rows
SELECT * FROM parking_history
WHERE parking_id NOT IN (SELECT parking_id FROM payment_history);

-- SAFE: filter out NULLs, or use NOT EXISTS instead
SELECT * FROM parking_history
WHERE parking_id NOT IN (
    SELECT parking_id FROM payment_history WHERE parking_id IS NOT NULL
);

NOT IN (... NULL ...) always returns UNKNOWN, so every row is silently dropped.

2. COUNT(*) vs COUNT(DISTINCT id) — Counting Rows vs Unique Users

-- Counts every row, including repeats (same car entering many times)
SELECT station_code, COUNT(*) AS visits
FROM parking_history GROUP BY station_code;

-- Counts unique cars per station
SELECT station_code, COUNT(DISTINCT license_plate) AS unique_cars
FROM parking_history GROUP BY station_code;

Ask yourself: "Am I counting events or unique entities?" The answer tells you which one to use.

3. Using WHERE with Aggregate Functions

-- BROKEN: WHERE runs before aggregation, so COUNT(*) does not exist yet
SELECT station_code, COUNT(*) AS total
FROM parking_history
WHERE COUNT(*) > 100          -- ❌ error
GROUP BY station_code;

-- CORRECT: HAVING filters after aggregation
SELECT station_code, COUNT(*) AS total
FROM parking_history
GROUP BY station_code
HAVING COUNT(*) > 100;        -- ✓

4. Using a SELECT Alias in WHERE (Execution Order Trap)

-- BROKEN: the 'revenue' alias does not exist at the WHERE step
SELECT station_code, SUM(amount_received) AS revenue
FROM payment_history
WHERE revenue > 10000          -- ❌ unknown column
GROUP BY station_code;

-- CORRECT: repeat the expression, or use HAVING
SELECT station_code, SUM(amount_received) AS revenue
FROM payment_history
GROUP BY station_code
HAVING SUM(amount_received) > 10000;

Look at the SQL Execution Order: WHERE (step 2) runs before SELECT (step 5), so aliases defined in SELECT do not exist yet.

5. ROW_NUMBER Does Not Handle Ties the Way You Expect

-- If two rows share the same entry_time, ROW_NUMBER picks one at random
WITH ranked AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY station_code ORDER BY entry_time DESC) AS rn
    FROM parking_history
)
SELECT * FROM ranked WHERE rn = 1;
-- One tied row is silently dropped — use RANK() or DENSE_RANK() if ties matter

6. Division by Zero in Percentage Calculations

-- BROKEN: crashes if total_visits is 0
SELECT station_code, paid_count / total_visits * 100 AS pct_paid
FROM station_summary;

-- SAFE: wrap the denominator with NULLIF
SELECT station_code,
       ROUND(paid_count / NULLIF(total_visits, 0) * 100, 1) AS pct_paid
FROM station_summary;

7. LAG Crossing a Partition Boundary

-- LAG looks back across stations when there is no PARTITION BY
SELECT station_code, month, revenue,
       LAG(revenue) OVER (ORDER BY month) AS prev   -- ❌ leaks across stations
FROM monthly_stats;

-- CORRECT: partition by the group you care about
SELECT station_code, month, revenue,
       LAG(revenue) OVER (PARTITION BY station_code ORDER BY month) AS prev
FROM monthly_stats;

8. BETWEEN Is Inclusive on Both Ends — Risky with Timestamps

-- This misses records on Dec 31 after 00:00:00
WHERE paid_time BETWEEN '2025-01-01' AND '2025-12-31'

-- Safer: use >= and < with the next day / year
WHERE paid_time >= '2025-01-01' AND paid_time < '2026-01-01'

Multi-Pattern Composition

Real problems rarely fit one pattern. This example combines four patterns into one monthly station report.

Problem: For each station in 2025, show monthly revenue, rank within its city, MoM growth, and a top-performer flag (top 25% by revenue that month).

Patterns used: Conditional Aggregation → Period-over-Period → Top N / Ranking → Conditional Aggregation (for the tier flag).

-- Step 1 (Pattern 10): aggregate monthly revenue per station
WITH monthly_revenue AS (
    SELECT mt.station_code,
           mt.station_name,
           mt.city,
           DATE_FORMAT(pay.paid_time, '%Y-%m') AS month,
           SUM(pay.amount_received)            AS revenue
    FROM payment_history pay
    JOIN management_table mt ON pay.station_code = mt.station_code
    WHERE YEAR(pay.paid_time) = 2025
    GROUP BY mt.station_code, mt.station_name, mt.city,
             DATE_FORMAT(pay.paid_time, '%Y-%m')
),

-- Step 2 (Pattern 3): pull in the previous month's revenue with LAG
with_growth AS (
    SELECT *,
           LAG(revenue) OVER (
               PARTITION BY station_code ORDER BY month
           ) AS prev_revenue
    FROM monthly_revenue
),

-- Step 3 (Pattern 8): rank each station within its city per month,
--         and assign a revenue quartile across all stations
with_rank AS (
    SELECT *,
           RANK() OVER (
               PARTITION BY city, month ORDER BY revenue DESC
           )                                              AS city_rank,
           NTILE(4) OVER (
               PARTITION BY month ORDER BY revenue DESC
           )                                              AS quartile
    FROM with_growth
)

-- Final SELECT: combine all the metrics, label the top performers
SELECT station_name,
       city,
       month,
       revenue,
       ROUND((revenue - prev_revenue) / NULLIF(prev_revenue, 0) * 100, 1) AS mom_pct,
       city_rank,
       CASE WHEN quartile = 1 THEN 'Top 25%' ELSE NULL END                AS tier
FROM with_rank
ORDER BY month, city, city_rank;

How to read the composition: - Each CTE does exactly one job — name it after what it produces, not how it works. - Patterns that need the same base data share the first CTE instead of hitting the table twice. - Window functions in different CTEs do not interfere with each other — layer them one at a time for clarity. - NULLIF(prev_revenue, 0) guards the MoM division (Common Mistake #6).

← Back to Blog