SQL Notebook

2026-03-06 Data Analysis 8 min read

SQL

All examples use MySQL syntax unless noted. Most of it is standard SQL, but date/string functions differ by database — see the MySQL vs PostgreSQL quick map at the end when working in Postgres.

Basic Query & Sorting

SELECT

SELECT license_plate, entry_time
FROM parking_history;

* returns every column.

Column Aliases

SELECT parking_id,
       parked_hours * 40 AS amount
FROM parking_history;

When you compute something, use AS to give it a clear name.

Removing Duplicates

SELECT DISTINCT parking_type
FROM parking_history;

DISTINCT makes sure the same row does not appear more than once.

ORDER BY Clause

SELECT *
FROM parking_history
ORDER BY entry_time ASC;

ORDER BY sorts the results. ASC means earliest first.

Filtering & Conditions

WHERE Clause

SELECT *
FROM parking_history
WHERE license_plate = 'AAA-1111';

WHERE filters the rows. Here, it returns the details for the license plate AAA-1111.

Conditions with AND, OR, NOT

SELECT license_plate, payment_method, amount, is_electric
FROM payment_history
WHERE (payment_method = 'Credit_Card' AND amount > 5000)
   OR (is_electric = 1);

AND: all conditions must be true.
OR: at least one condition must be true.
NOT: flips a condition.

SQL checks these in this order: NOT → AND → OR. Always use brackets () for complex logic so you don't make a mistake.

Not Equal (<>)

SELECT parking_id, parking_type, entry_time, exit_time
FROM parking_history
WHERE parking_type <> 'monthly';

<> means "not equal". This returns every non-monthly parking record. Same as WHERE NOT parking_type = 'monthly'. <> is shorter and very common.

Check for Missing Values (IS NULL)

SELECT station_code, license_plate, entry_time
FROM parking_history
WHERE exit_time IS NULL;

IS NULL finds rows with no value (the car is still parked).
Never use = NULL or <> NULL — they don't work.
Use IS NULL or IS NOT NULL only.

Date Range (BETWEEN or >= / <)

SELECT COUNT(*) AS total_cars
FROM parking_history
WHERE entry_time >= '2025-01-01'
  AND entry_time <  '2026-01-01';

The best way to cover a full year: >= start AND < next_year_start.
Safe even with timestamps (hours, minutes, seconds).
BETWEEN also works, but it can miss records on Dec 31 if the time is not exactly 00:00:00.

IN (Match a List)

SELECT station_code, station_name, open_date
FROM management_table
WHERE YEAR(open_date) IN (2025, 2024, 2023);

IN checks if the value is in the list. Cleaner than many ORs.

Pattern Matching (LIKE)

SELECT station_code, license_plate
FROM parking_history
WHERE license_plate LIKE 'E%';

LIKE finds text patterns. - % = any characters (zero or more) - _ = exactly one character

Aggregation & Grouping

Aggregate Functions

SELECT station_code,
       COUNT(*) AS vehicle_count,
       AVG(TIMESTAMPDIFF(MINUTE, entry_time, exit_time)) AS avg_duration_mins,
       MAX(TIMESTAMPDIFF(MINUTE, entry_time, exit_time)) AS max_duration_mins,
       MIN(TIMESTAMPDIFF(MINUTE, entry_time, exit_time)) AS min_duration_mins
FROM parking_history
WHERE exit_time IS NOT NULL
GROUP BY station_code;

Per group: COUNT(*) → row count, AVG() → average, MAX/MIN → highest / lowest value.
Use WHERE first to filter rows before grouping.

Basic Grouping (GROUP BY)

SELECT station_code, COUNT(*) AS total_entries
FROM parking_history
GROUP BY station_code
ORDER BY total_entries DESC;

Group by Several Columns

SELECT city, station_type, COUNT(*) AS station_count
FROM management_table
GROUP BY city, station_type
ORDER BY city, station_type;

GROUP BY city, then station_type.

Grouping via Expressions

SELECT HOUR(entry_time) AS entry_hour,
       COUNT(*) AS traffic_volume
FROM parking_history
GROUP BY HOUR(entry_time)
ORDER BY entry_hour;

GROUP BY HOUR(entry_time) → hourly buckets (0–23). You can analyze trends without adding extra columns.

Filter Groups (HAVING)

SELECT station_code, COUNT(*) AS total_visits
FROM parking_history
GROUP BY station_code
HAVING COUNT(*) > 1000;

HAVING filters AFTER grouping.
WHERE filters BEFORE (raw rows).
You cannot use HAVING on non-aggregates.

Add Totals (ROLLUP)

SELECT payment_method, COUNT(*) AS tx_count
FROM payment_history
GROUP BY payment_method WITH ROLLUP;

WITH ROLLUP adds subtotal rows + a grand total (NULL row). Quick reports without extra queries.

Table Joins

FROM Clause & Table Aliases

SELECT ph.station_code, ph.entry_time, py.amount
FROM parking_history ph
JOIN payment_history py
  ON py.parking_id = ph.parking_id;

FROM tells the query which table to read. Short aliases (like ph, py) make the query shorter and easier to read, especially when you join tables.

Inner Join (Most Common)

SELECT ph.license_plate, ph.entry_time, mt.bd_name, mt.city
FROM parking_history ph
INNER JOIN management_table mt
  ON ph.station_code = mt.station_code;

INNER JOIN keeps only rows that have a match in both tables.

Left Join (Keep All from the Left Table)

SELECT mt.station_name,
       COALESCE(SUM(pay.amount_received), 0) AS total_revenue
FROM management_table mt
LEFT JOIN payment_history pay
  ON mt.station_code = pay.station_code
 AND DATE(pay.paid_time) = CURRENT_DATE
GROUP BY mt.station_name;

Joining Three Tables

SELECT ph.license_plate, mt.area, pay.payment_method, pay.paid_time
FROM parking_history ph
INNER JOIN management_table mt
  ON ph.station_code = mt.station_code
INNER JOIN payment_history pay
  ON ph.station_code = pay.station_code
 AND ph.entry_time   = pay.entry_time;

Start with parking_history → add station info → add payment info.

Chain Several Outer Joins

SELECT mt.station_name, ph.entry_time, pay.amount_received
FROM management_table mt
LEFT JOIN parking_history ph
  ON mt.station_code = ph.station_code
LEFT JOIN payment_history pay
  ON ph.station_code = pay.station_code
 AND ph.entry_time  = pay.entry_time;

Self-Join

SELECT t1.station_name AS station_a, t1.bd_team AS team_a,
       t2.station_name AS station_b, t2.bd_team AS team_b, t1.area
FROM management_table t1
INNER JOIN management_table t2
  ON t1.area    = t2.area
 AND t1.bd_team <> t2.bd_team;

Use two aliases (t1, t2) for the same table. Find pairs of stations in the same area but different teams.

Cross Join (Every Combo)

SELECT mt.station_name, d.day_num
FROM management_table mt
CROSS JOIN (
    SELECT 1 AS day_num UNION ALL
    SELECT 2 UNION ALL
    SELECT 3 UNION ALL
    SELECT 4 UNION ALL
    SELECT 5
) d
ORDER BY mt.station_name, d.day_num;

Set Operations

The UNION Operator

SELECT station_code FROM parking_history
UNION
SELECT station_code FROM payment_history;

UNION stacks two query results and removes duplicates. Rules: same number of columns, compatible data types, column names from the first query.

UNION ALL (Combine + Keep All Rows)

SELECT *
FROM (
    SELECT entry_time AS event_time, 'Vehicle Entry' AS event_type
    FROM parking_history
    WHERE station_code = 'ST001'
    UNION ALL
    SELECT paid_time AS event_time, 'Payment Made' AS event_type
    FROM payment_history
    WHERE station_code = 'ST001'
) AS events
ORDER BY event_time;

UNION ALL keeps every row (faster, no de-duplication). A derived table (subquery in FROM) always needs an alias (AS events) — leaving it out is a syntax error in both MySQL and PostgreSQL.

INTERSECT (Rows in Both Queries)

MySQL supports INTERSECT / EXCEPT only from 8.0.31. On older versions, emulate with IN / NOT EXISTS. PostgreSQL has always supported both.

SELECT station_code, entry_time
FROM parking_history
INTERSECT
SELECT station_code, entry_time
FROM payment_history;

INTERSECT returns only rows that appear in BOTH results.

EXCEPT (Rows in First but Not Second)

SELECT station_code, entry_time
FROM parking_history
EXCEPT
SELECT station_code, entry_time
FROM payment_history;

EXCEPT returns rows that are only in the FIRST query.

Subqueries

Basic Subquery

SELECT station_code, station_name, open_date
FROM management_table
WHERE open_date = (
    SELECT MAX(open_date)
    FROM management_table
);

The inner query runs first to find the latest open_date.
The outer query gets the stations that opened on that date.

Scalar Subquery in SELECT

SELECT mt.station_name,
       mt.city,
       (SELECT COUNT(*)
        FROM parking_history ph
        WHERE ph.station_code = mt.station_code) AS total_visits
FROM management_table mt;

The subquery returns one value per row.

IN Subquery (Filter by List)

SELECT mt.station_name, ph.license_plate, ph.entry_time
FROM parking_history ph
JOIN management_table mt
  ON mt.station_code = ph.station_code
WHERE ph.station_code IN (
    SELECT station_code
    FROM management_table
    WHERE area = 'Xinyi Dist'
);

Non-correlated (the subquery runs once). The subquery → list of station_codes in Xinyi.

Derived Table (Subquery in FROM)

SELECT station_code, day_of_record, daily_total
FROM (
    SELECT station_code,
           DATE(paid_time) AS day_of_record,
           SUM(amount_received) AS daily_total
    FROM payment_history
    GROUP BY station_code, DATE(paid_time)
) AS daily_stats
WHERE daily_total > 10000;

Inner → daily revenue per station.
Outer → keep only the high-revenue days (> 10,000).

NOT EXISTS (Find Unmatched Rows)

SELECT ph.license_plate, ph.station_code, ph.entry_time
FROM parking_history ph
WHERE NOT EXISTS (
    SELECT 1
    FROM payment_history pay
    WHERE pay.station_code = ph.station_code
      AND pay.entry_time   = ph.entry_time
);

Correlated subquery: it runs once for every row in the outer query (parking_history).
If NO matching record is found in the inner query (payment_history) → it returns TRUE and the row is kept.
Use case: finding unpaid parking sessions. SELECT 1 is used because the actual data does not matter — only whether the row exists.

Conditional Logic

Simple CASE (Exact Match Mapping)

SELECT station_name,
       CASE station_type
           WHEN 1 THEN 'Indoor Garage'
           WHEN 2 THEN 'Outdoor Lot'
           WHEN 3 THEN 'Street Parking'
           ELSE 'Unknown'
       END AS type_description
FROM management_table;

Match exact values only.

Searched CASE (Ranges & Complex Rules)

SELECT ticket_id,
       TIMESTAMPDIFF(HOUR, entry_time, exit_time) AS duration_hours,
       CASE
           WHEN TIMESTAMPDIFF(HOUR, entry_time, exit_time) < 2 THEN 'Short-term'
           WHEN TIMESTAMPDIFF(HOUR, entry_time, exit_time) BETWEEN 2 AND 8 THEN 'Medium'
           ELSE 'Long-term'
       END AS duration_category
FROM parking_history
WHERE exit_time IS NOT NULL;

You can check any condition (ranges, >, <, AND/OR).

LIMIT & OFFSET

LIMIT (Top N Rows)

SELECT license_plate, entry_time
FROM parking_history
ORDER BY entry_time DESC
LIMIT 10;

LIMIT returns only the first N rows. Always pair it with ORDER BY, otherwise the "top" rows are random.

OFFSET (Pagination)

SELECT license_plate, entry_time
FROM parking_history
ORDER BY entry_time DESC
LIMIT 10 OFFSET 20;

OFFSET skips the first N rows. Page 1 → OFFSET 0, Page 2 → OFFSET 10, Page 3 → OFFSET 20.

SQL Execution Order

SQL clauses are written in one order but run in a different order:

Step	Clause	What it does
1	FROM / JOIN	Load tables and combine rows
2	WHERE	Filter raw rows
3	GROUP BY	Group the remaining rows
4	HAVING	Filter groups
5	SELECT	Pick columns, run expressions
6	DISTINCT	Remove duplicate rows
7	ORDER BY	Sort results
8	LIMIT / OFFSET	Cut to the final row count

This is why: - You cannot use a SELECT alias in WHERE (the alias does not exist yet at step 2). - You cannot use aggregate functions in WHERE — use HAVING instead. - You can use a SELECT alias in ORDER BY (the alias exists by then).

CTEs (Common Table Expressions)

Basic CTE

WITH station_revenue AS (
    SELECT station_code,
           SUM(amount_received) AS total_revenue
    FROM payment_history
    GROUP BY station_code
)
SELECT mt.station_name, sr.total_revenue
FROM management_table mt
JOIN station_revenue sr
  ON mt.station_code = sr.station_code
ORDER BY sr.total_revenue DESC;

WITH names a temporary result set. The outer query reads it like a normal table. Prefer CTEs over nested subqueries — same logic, much easier to read.

Several CTEs

WITH active_stations AS (
    SELECT DISTINCT station_code
    FROM parking_history
    WHERE entry_time >= '2025-01-01'
),
station_revenue AS (
    SELECT station_code,
           SUM(amount_received) AS total_revenue
    FROM payment_history
    GROUP BY station_code
)
SELECT mt.station_name, sr.total_revenue
FROM management_table mt
JOIN active_stations a  ON mt.station_code = a.station_code
JOIN station_revenue sr ON mt.station_code = sr.station_code;

Chain several CTEs with commas. Each one can use the ones before it.

Window Functions

Window functions compute values across a set of rows without collapsing them into one row (unlike GROUP BY).

ROW_NUMBER (Rank Within a Group)

SELECT station_code, license_plate, entry_time,
       ROW_NUMBER() OVER (
           PARTITION BY station_code
           ORDER BY entry_time DESC
       ) AS visit_rank
FROM parking_history;

PARTITION BY: reset the counter for each group (like GROUP BY, but you keep all the rows).
ORDER BY: sets the order inside each partition.
ROW_NUMBER always gives a unique number, even on ties.

RANK vs DENSE_RANK

SELECT station_code,
       SUM(amount_received) AS revenue,
       RANK()       OVER (ORDER BY SUM(amount_received) DESC) AS rnk,
       DENSE_RANK() OVER (ORDER BY SUM(amount_received) DESC) AS drnk
FROM payment_history
GROUP BY station_code;

RANK: ties share the same rank, the next rank skips numbers (1, 2, 2, 4).
DENSE_RANK: ties share the same rank, the next rank does not skip (1, 2, 2, 3).
RANK and DENSE_RANK are reserved words in MySQL 8 — never use them as column aliases (use rnk / drnk).

LAG / LEAD (Compare to Previous or Next Row)

SELECT station_code,
       DATE(paid_time)                          AS day,
       SUM(amount_received)                     AS daily_revenue,
       LAG(SUM(amount_received))  OVER (
           PARTITION BY station_code
           ORDER BY DATE(paid_time)
       )                                        AS prev_day_revenue,
       LEAD(SUM(amount_received)) OVER (
           PARTITION BY station_code
           ORDER BY DATE(paid_time)
       )                                        AS next_day_revenue
FROM payment_history
GROUP BY station_code, DATE(paid_time);

LAG: looks at the previous row's value.
LEAD: looks at the next row's value.
Useful for day-over-day or month-over-month comparisons.

Running Total (SUM OVER)

SELECT DATE(paid_time)      AS day,
       SUM(amount_received) AS daily_revenue,
       SUM(SUM(amount_received)) OVER (
           ORDER BY DATE(paid_time)
       )                    AS running_total
FROM payment_history
GROUP BY DATE(paid_time)
ORDER BY day;

OVER (ORDER BY ...) with no PARTITION BY runs across the whole result set.

NULL Handling

COALESCE (First Non-NULL Value)

SELECT station_code,
       license_plate,
       COALESCE(exit_time, 'Still Parked') AS status
FROM parking_history;

COALESCE returns the first argument that is not NULL. Useful for replacing NULLs with a default. MySQL silently converts the timestamp to text here; PostgreSQL is strict about types — cast first: COALESCE(exit_time::text, 'Still Parked').

-- Several fallbacks
SELECT COALESCE(phone_number, email, 'No contact') AS contact
FROM customer_table;

It checks left to right and returns the first value that is not NULL.

NULLIF (Return NULL on a Match)

SELECT station_code,
       total_visits,
       total_revenue / NULLIF(total_visits, 0) AS revenue_per_visit
FROM station_summary;

NULLIF(a, b) returns NULL if a = b, otherwise it returns a. Classic use: prevent divide-by-zero errors.

String & Date Functions

String Functions

SELECT UPPER(station_name)             AS name_upper,     -- 'north lot' → 'NORTH LOT'
       LOWER(station_name)             AS name_lower,     -- 'NORTH LOT' → 'north lot'
       TRIM(station_name)              AS name_trimmed,   -- removes spaces at start and end
       LENGTH(station_name)            AS name_length,    -- number of characters
       SUBSTRING(license_plate, 1, 3)  AS plate_prefix,  -- first 3 chars: 'AAA'
       CONCAT(city, ' - ', station_name) AS full_name    -- join strings together
FROM management_table;

Date Functions

SELECT entry_time,
       DATE(entry_time)                       AS date_only,       -- '2025-06-15'
       DATE_FORMAT(entry_time, '%Y-%m')       AS year_month,      -- '2025-06'
       DATEDIFF(exit_time, entry_time)        AS days_parked,     -- difference in days
       DATE_ADD(entry_time, INTERVAL 2 HOUR)  AS two_hours_later,
       DATE_SUB(entry_time, INTERVAL 1 DAY)   AS day_before,
       NOW()                                  AS current_datetime,
       CURDATE()                              AS today
FROM parking_history;

DATE() drops the time part.
DATE_FORMAT() converts to any string format (%Y = year, %m = month, %d = day, %H = hour).
DATEDIFF(end, start) returns the number of days between two dates.
DATE_ADD / DATE_SUB shift a date by an interval (HOUR, DAY, MONTH, YEAR).

Type Conversion

CAST (Convert Data Types)

SELECT license_plate,
       CAST(parked_hours AS DECIMAL(10, 2)) AS hours_decimal,
       CAST(entry_time AS DATE)             AS entry_date,
       CAST(amount AS CHAR)                 AS amount_text
FROM parking_history;

CAST(value AS type) converts a value to the target type. Common target types: INT, DECIMAL(p, s), CHAR, DATE, DATETIME.

Practical Use: Fixing Messy Data

-- A string stored as a number → you cannot do math without CAST
SELECT station_code,
       SUM(CAST(revenue_text AS DECIMAL(10, 2))) AS total_revenue
FROM raw_import_table
GROUP BY station_code;

Real-world data often comes in the wrong type (for example, numbers imported as text). CAST fixes this before you aggregate.

-- Safe date filtering when the date column is stored as text
SELECT *
FROM parking_history
WHERE CAST(entry_time AS DATE) = '2025-06-15';

CONVERT (MySQL Alternative)

SELECT CONVERT(amount, DECIMAL(10,2)) AS amount_decimal,
       CONVERT(entry_time, DATE)       AS entry_date
FROM parking_history;

CONVERT(value, type) is MySQL-specific. CAST is the SQL standard and works in most databases.

Pivot (Rows to Columns)

A pivot turns row values into column headers — the most common business report format.

Monthly Revenue by Station (CASE + GROUP BY)

SELECT station_code,
       SUM(CASE WHEN MONTH(paid_time) = 1  THEN amount_received ELSE 0 END) AS jan,
       SUM(CASE WHEN MONTH(paid_time) = 2  THEN amount_received ELSE 0 END) AS feb,
       SUM(CASE WHEN MONTH(paid_time) = 3  THEN amount_received ELSE 0 END) AS mar,
       SUM(CASE WHEN MONTH(paid_time) = 4  THEN amount_received ELSE 0 END) AS apr,
       SUM(CASE WHEN MONTH(paid_time) = 5  THEN amount_received ELSE 0 END) AS may,
       SUM(CASE WHEN MONTH(paid_time) = 6  THEN amount_received ELSE 0 END) AS jun,
       SUM(CASE WHEN MONTH(paid_time) = 7  THEN amount_received ELSE 0 END) AS jul,
       SUM(CASE WHEN MONTH(paid_time) = 8  THEN amount_received ELSE 0 END) AS aug,
       SUM(CASE WHEN MONTH(paid_time) = 9  THEN amount_received ELSE 0 END) AS sep,
       SUM(CASE WHEN MONTH(paid_time) = 10 THEN amount_received ELSE 0 END) AS oct,
       SUM(CASE WHEN MONTH(paid_time) = 11 THEN amount_received ELSE 0 END) AS nov,
       SUM(CASE WHEN MONTH(paid_time) = 12 THEN amount_received ELSE 0 END) AS december
FROM payment_history
WHERE YEAR(paid_time) = 2025
GROUP BY station_code
ORDER BY station_code;

Pattern: one SUM(CASE WHEN ... THEN value ELSE 0 END) per column you want to create. Note: dec is a reserved word (short for DECIMAL), so December gets its full name as the alias.

Payment Method Breakdown per Station

SELECT station_code,
       SUM(CASE WHEN payment_method = 'Credit_Card' THEN 1 ELSE 0 END) AS credit_card_count,
       SUM(CASE WHEN payment_method = 'Cash'        THEN 1 ELSE 0 END) AS cash_count,
       SUM(CASE WHEN payment_method = 'Mobile_Pay'  THEN 1 ELSE 0 END) AS mobile_pay_count,
       COUNT(*)                                                         AS total_count
FROM payment_history
GROUP BY station_code;

Use SUM(CASE WHEN ... THEN 1 ELSE 0 END) to count occurrences, or SUM(... THEN amount ...) to add up values.

SELECT station_code,
       COUNT(*)                                                                  AS total,
       SUM(CASE WHEN payment_method = 'Credit_Card' THEN 1 ELSE 0 END)         AS credit_card_count,
       ROUND(
           SUM(CASE WHEN payment_method = 'Credit_Card' THEN 1 ELSE 0 END)
           / COUNT(*) * 100, 1
       )                                                                         AS credit_card_pct
FROM payment_history
GROUP BY station_code;

Divide the pivoted count by COUNT(*) to get the percentage share — a standard business reporting pattern. PostgreSQL trap: integer ÷ integer is integer division, so SUM(...) / COUNT(*) * 100 returns 0. Write * 100.0 or cast one side: SUM(...)::numeric / COUNT(*). MySQL's / always returns a decimal, so this only bites in Postgres.

MySQL vs PostgreSQL Quick Map

The logic in this notebook is portable; the function names often are not. The most common swaps:

Task	MySQL	PostgreSQL
Format date as 'YYYY-MM'	`DATE_FORMAT(d, '%Y-%m')`	`TO_CHAR(d, 'YYYY-MM')`
Truncate to month	`DATE_FORMAT(d, '%Y-%m-01')`	`DATE_TRUNC('month', d)`
Difference in minutes	`TIMESTAMPDIFF(MINUTE, a, b)`	`EXTRACT(EPOCH FROM (b - a)) / 60`
Add an interval	`DATE_ADD(d, INTERVAL 1 DAY)`	`d + INTERVAL '1 day'`
Extract hour / year	`HOUR(d)` / `YEAR(d)`	`EXTRACT(HOUR FROM d)` / `EXTRACT(YEAR FROM d)`
Current date	`CURDATE()`	`CURRENT_DATE`
Aggregate strings	`GROUP_CONCAT(x SEPARATOR ', ')`	`STRING_AGG(x, ', ')`
Subtotals	`GROUP BY a, b WITH ROLLUP`	`GROUP BY ROLLUP (a, b)`
Integer division	`/` returns decimal	`/` on two integers truncates — use `100.0` or `::numeric`
String concat	`CONCAT(a, b)`	`CONCAT(a, b)` or a \|\| b

PostgreSQL also supports GROUPING SETS and CUBE, which MySQL does not — see SQL Advanced Notebook.

Ready to apply these building blocks to real problems? See SQL Patterns for reusable solutions to common analysis and interview questions. For window frames, recursive CTEs, grouping sets, and query performance, see SQL Advanced Notebook. When you need to change data instead of reading it — INSERT, UPDATE, transactions, CREATE TABLE — see SQL Writes & DDL Notebook.

← Back to Blog