RFM Analysis Notebook

RFM is the most practical way to group customers by behavior. It uses three numbers from each customer's transactions to sort them into segments. No survey, no guessing — just what they actually did.

This notebook covers two versions: - License plate version — for parking data with no membership system - Member ID version — for businesses with an app or registered accounts (more accurate)

All examples use parking system tables: parking_history, payment_history, and member_vehicles (the app membership table).

See SQL Patterns for the building blocks used here, and Pandas Notebook for the Python data work.


What is RFM?

Metric Question Parking definition
Recency (R) How recently did this customer pay? Days since last paid parking session
Frequency (F) How often do they pay? Number of paid sessions in the analysis window
Monetary (M) How much have they spent? Total amount paid in the analysis window

A customer who parked yesterday (R=high), parks 3 times a week (F=high), and spends a lot (M=high) is your most valuable group. A customer who used to park often but stopped three months ago is a churn risk worth targeting.


Version 1 — License Plate Based

Use this when there is no membership system. Each license plate counts as one customer.

Step 1 — Calculate Raw RFM Metrics in SQL

-- analysis window: last 12 months
-- reference date: today (or a fixed reporting date)
WITH paid_sessions AS (
    SELECT
        ph.license_plate,
        ph.parking_id,
        pay.paid_time,
        pay.amount_received
    FROM parking_history ph
    JOIN payment_history pay ON ph.parking_id = pay.parking_id
    WHERE pay.paid_time >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
)
SELECT
    license_plate,
    DATEDIFF(CURDATE(), MAX(paid_time))   AS recency_days,
    COUNT(*)                              AS frequency,
    SUM(amount_received)                  AS monetary
FROM paid_sessions
GROUP BY license_plate;

Step 2 — Score Each Metric (1 to 5)

Give each metric a score from 1 to 5 with NTILE(5). Score 5 = best.

  • Recency: fewer days = better → score 5 for the most recent customers
  • Frequency: higher count = better → score 5 for the most frequent customers
  • Monetary: higher spend = better → score 5 for the highest spenders
WITH paid_sessions AS (
    SELECT ph.license_plate, pay.paid_time, pay.amount_received
    FROM parking_history ph
    JOIN payment_history pay ON ph.parking_id = pay.parking_id
    WHERE pay.paid_time >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
),
rfm_raw AS (
    SELECT
        license_plate,
        DATEDIFF(CURDATE(), MAX(paid_time)) AS recency_days,
        COUNT(*)                            AS frequency,
        SUM(amount_received)                AS monetary
    FROM paid_sessions
    GROUP BY license_plate
)
SELECT
    license_plate,
    recency_days,
    frequency,
    monetary,
    -- recency: fewer days = better = score 5
    6 - NTILE(5) OVER (ORDER BY recency_days ASC)  AS r_score,
    NTILE(5) OVER (ORDER BY frequency ASC)          AS f_score,
    NTILE(5) OVER (ORDER BY monetary ASC)           AS m_score
FROM rfm_raw;

Step 3 — Assign Segments

WITH rfm_scored AS (
    -- ... (previous CTE) ...
)
SELECT
    license_plate,
    r_score, f_score, m_score,
    CONCAT(r_score, f_score, m_score) AS rfm_code,
    CASE
        WHEN r_score >= 4 AND f_score >= 4             THEN 'Champions'
        WHEN r_score >= 3 AND f_score >= 3             THEN 'Loyal'
        WHEN r_score >= 4 AND f_score <= 2             THEN 'New Customers'
        WHEN r_score <= 2 AND f_score >= 3             THEN 'At Risk'
        WHEN r_score <= 2 AND f_score <= 2             THEN 'Lost'
        ELSE                                                'Potential Loyalists'
    END AS segment
FROM rfm_scored;

Version 2 — Member ID Based (APP Users)

When customers sign up through an app, one person can own several license plates. Use member_id as the customer key — it follows the person, not the car.

Assumed table: member_vehicles(member_id, license_plate) — links one member to one or more plates.

Step 1 — Join to Member Table

WITH member_sessions AS (
    SELECT
        mv.member_id,
        pay.paid_time,
        pay.amount_received
    FROM parking_history ph
    JOIN payment_history pay   ON ph.parking_id   = pay.parking_id
    JOIN member_vehicles mv    ON ph.license_plate = mv.license_plate
    WHERE pay.paid_time >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
),
rfm_raw AS (
    SELECT
        member_id,
        DATEDIFF(CURDATE(), MAX(paid_time)) AS recency_days,
        COUNT(*)                            AS frequency,
        SUM(amount_received)                AS monetary
    FROM member_sessions
    GROUP BY member_id
)
SELECT
    member_id,
    recency_days,
    frequency,
    monetary,
    6 - NTILE(5) OVER (ORDER BY recency_days ASC)  AS r_score,
    NTILE(5) OVER (ORDER BY frequency ASC)          AS f_score,
    NTILE(5) OVER (ORDER BY monetary ASC)           AS m_score
FROM rfm_raw;

Compared to the license plate version: - A member who changed cars is still seen as the same customer - All parking sessions across stations are linked to one person - You can send push notifications to the member account after segmentation


Python Implementation

Use Python when you need to try out different scoring rules, draw distribution charts, or feed segments into the next step.

import pandas as pd
import numpy as np
from datetime import date

# load raw data (already joined in SQL or loaded as CSV)
df = pd.read_csv('parking_payments.csv', parse_dates=['paid_time'])

reference_date = pd.Timestamp(date.today())
analysis_start = reference_date - pd.DateOffset(months=12)
df = df[df['paid_time'] >= analysis_start]

Calculate Raw RFM Metrics

rfm = df.groupby('member_id').agg(
    last_visit  = ('paid_time',       'max'),
    frequency   = ('parking_id',      'count'),
    monetary    = ('amount_received',  'sum')
).reset_index()

rfm['recency_days'] = (reference_date - rfm['last_visit']).dt.days
rfm = rfm.drop(columns='last_visit')

Score Each Metric

rfm['r_score'] = pd.qcut(rfm['recency_days'], q=5,
                          labels=[5, 4, 3, 2, 1])   # fewer days = better = score 5
rfm['f_score'] = pd.qcut(rfm['frequency'].rank(method='first'), q=5,
                          labels=[1, 2, 3, 4, 5])
rfm['m_score'] = pd.qcut(rfm['monetary'].rank(method='first'), q=5,
                          labels=[1, 2, 3, 4, 5])

rfm[['r_score', 'f_score', 'm_score']] = rfm[['r_score', 'f_score', 'm_score']].astype(int)
rfm['rfm_score'] = rfm['r_score'] + rfm['f_score'] + rfm['m_score']

pd.qcut cuts the data into equal-sized buckets. Use rank(method='first') to break ties before passing to qcut — without it, repeated values create uneven buckets.

Assign Segments

def assign_segment(row):
    r, f = row['r_score'], row['f_score']
    if r >= 4 and f >= 4:
        return 'Champions'
    elif r >= 3 and f >= 3:
        return 'Loyal'
    elif r >= 4 and f <= 2:
        return 'New Customers'
    elif r <= 2 and f >= 3:
        return 'At Risk'
    elif r <= 2 and f <= 2:
        return 'Lost'
    else:
        return 'Potential Loyalists'

rfm['segment'] = rfm.apply(assign_segment, axis=1)

Summarise Segments

summary = rfm.groupby('segment').agg(
    customer_count = ('member_id',     'count'),
    avg_recency    = ('recency_days',  'mean'),
    avg_frequency  = ('frequency',     'mean'),
    avg_monetary   = ('monetary',      'mean'),
    total_revenue  = ('monetary',      'sum')
).round(1).sort_values('total_revenue', ascending=False)

print(summary)

Customer Segments and Business Actions

Segment Profile Action
Champions Parked recently, often, spent the most Reward with loyalty points or priority space
Loyal Regular visitors, steady spend Promote a monthly pass to lock in revenue
Potential Loyalists Recent, medium frequency Send a welcome offer to build the habit
New Customers Recent first-timers, low frequency Onboarding push — help them come back a second and third time
At Risk Used to come often, haven't been back Win-back campaign: a limited discount
Lost Low recency, low frequency Cheap reactivation try, or write off

Tracking Segments Over Time

Run RFM every month and watch how customers move between segments. This is more useful than a one-time snapshot.

# add a reporting month column to each run
rfm['report_month'] = pd.Timestamp(date.today()).to_period('M')

# save results
rfm.to_csv(f"rfm_{date.today().strftime('%Y%m')}.csv", index=False)
-- compare segment size month over month
SELECT
    report_month,
    segment,
    COUNT(*) AS customer_count
FROM rfm_history
GROUP BY report_month, segment
ORDER BY report_month, segment;

Movements to watch: - Champions → At Risk: an early sign of churn - New Customers → Potential Loyalists: your onboarding is working - At Risk → Lost: your win-back campaign was too late


Common Mistakes

1. Including Sessions with No Payment

Always join parking_history to payment_history before calculating RFM. Sessions that ended without payment will mess up recency and frequency.

2. Choosing the Wrong Analysis Window

Too short (1 month) misses seasonal customers. Too long (3 years) hides recent changes in behavior. 12 months is the standard starting point. Adjust it if your business has a strong seasonal pattern.

3. Using Fixed Score Boundaries Instead of Percentiles

Setting R score 5 = "parked within 7 days" is fragile. It depends on your current data and breaks when traffic changes. NTILE and pd.qcut follow the actual data distribution, so they adjust on their own.

4. Treating All Segments as Equally Important

Segments with high monetary value deserve more attention. Always weigh business actions by revenue impact, not just by customer count.