Time Series Notebook
Most business data is a time series: revenue per day, sessions per week, kWh per month. This notebook covers the DA layer of time series work — describing trend and seasonality, building honest comparisons, simple forecasting, and flagging anomalies. It stops before heavy modeling (ARIMA, Prophet); for most reporting questions you will not need them.
The Pandas groundwork (resample, rolling, shift) is introduced in Pandas Notebook; this notebook builds on it. For whether a change is signal or noise, see Statistics Notebook.
All examples use daily parking revenue built from payment_df.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.holtwinters import ExponentialSmoothing
Prepare the Series
daily = (
payment_df
.set_index('paid_time')
.resample('D')['amount']
.sum()
)
Two preparation rules before any analysis:
1. A sorted DatetimeIndex. Resample, rolling windows, and decomposition all assume it. set_index('paid_time') plus resample handles it; if you built the series another way, daily = daily.sort_index().
2. An explicit decision about missing days. resample('D').sum() fills calendar gaps with 0 — correct for revenue (no payments = zero revenue). But for an average-type metric, a missing day is "no data", not "zero":
avg_amount = payment_df.set_index('paid_time').resample('D')['amount'].mean()
# days with no rows are NaN — keep them NaN, or fill deliberately:
avg_amount = avg_amount.interpolate() # only if a smooth estimate is acceptable
Filling "no data" with 0 silently drags down every rolling average that touches it — the most common silent bug in time series reporting.
Look at It First
fig, ax = plt.subplots(figsize=(12, 4))
daily.plot(ax=ax)
ax.set_title('Daily Revenue')
plt.tight_layout()
plt.show()
Before computing anything, read the plot for the four ingredients: trend (long-term direction), seasonality (repeating weekly/yearly pattern), events (one-off spikes and dips), and noise. Everything below is a tool for separating them.
Rolling Statistics
daily_df = daily.to_frame('revenue')
daily_df['ma_7'] = daily_df['revenue'].rolling(7).mean() # weekly smoothing
daily_df['ma_28'] = daily_df['revenue'].rolling(28).mean() # monthly trend
fig, ax = plt.subplots(figsize=(12, 4))
daily_df['revenue'].plot(ax=ax, alpha=0.4, label='daily')
daily_df['ma_7'].plot(ax=ax, label='7-day MA')
daily_df['ma_28'].plot(ax=ax, label='28-day MA')
ax.legend()
plt.tight_layout()
plt.show()
A 7-day window is the workhorse for daily business data: it contains every weekday exactly once, so day-of-week seasonality cancels out and what remains is trend. Plot the smoothed line with the raw one, never instead of it — see Visualization Selection Guide.
rolling(7) needs 7 values before it produces output (the first 6 are NaN). Add min_periods=1 only when a partial-window average is genuinely acceptable.
Seasonality Profiles
Day-of-week and month profiles answer "what is normal?" — the baseline every comparison needs.
# day-of-week profile
dow = daily.groupby(daily.index.dayofweek).mean()
dow.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
# month profile (needs 1+ year of data to be meaningful)
monthly_profile = daily.groupby(daily.index.month).mean()
Practical consequences: compare Monday to last Monday, not to Sunday; compare this June to last June, not to May. The same idea at month grain — comparing a 30-day month to a 31-day month favors the longer one; compare daily averages instead.
# month-over-month, fairly: average per day, not total
monthly_avg = daily.resample('ME').mean()
Decomposition — Trend, Seasonality, Residual
seasonal_decompose splits the series into three parts you can inspect separately:
result = seasonal_decompose(daily, model='additive', period=7) # 7 = weekly cycle
result.plot()
plt.tight_layout()
plt.show()
result.trend # the smoothed direction
result.seasonal # the repeating weekly pattern
result.resid # what's left — events and noise live here
model='additive'when the seasonal swing is roughly constant in absolute terms;'multiplicative'when the swing grows with the level (seasonality looks like ±20% rather than ±$5,000).periodis the cycle length in observations: 7 for daily data with a weekly cycle, 12 for monthly data with a yearly cycle.- The residual panel is the most useful one for a DA: a spike there is a real event, not seasonality — exactly what Step 1 of Metrics & Diagnosis Guide needs.
Lags and Autocorrelation
daily_df['lag_1'] = daily_df['revenue'].shift(1) # yesterday
daily_df['lag_7'] = daily_df['revenue'].shift(7) # same day last week
daily_df['wow_pct'] = (daily_df['revenue'] / daily_df['lag_7'] - 1) * 100
daily.autocorr(lag=7) # correlation with 7 days ago — high = strong weekly cycle
Autocorrelation quantifies what the seasonality profile shows: autocorr(7) near 0.8 means same-day-last-week explains most of today. That is also why lag-7 is the right baseline for daily comparisons — and the basis of the seasonal naive forecast below.
Simple Forecasting
Climb this ladder only as far as the question requires:
Level 1 — Seasonal Naive (the Baseline)
"Next Monday = last Monday."
forecast_snaive = daily.shift(7)
Trivial — and surprisingly hard to beat on stable weekly-cycle data. Every fancier model must outperform this to justify itself, same logic as the dummy baseline in Machine Learning Notebook.
Level 2 — Holt-Winters (Trend + Seasonality)
Exponential smoothing with explicit trend and seasonal components — the most model you usually need for short-horizon business forecasts:
train = daily[:-28]
test = daily[-28:] # hold out the last 4 weeks
model = ExponentialSmoothing(
train,
trend='add',
seasonal='add', # 'mul' if the weekly swing scales with the level
seasonal_periods=7,
).fit()
forecast = model.forecast(28)
Evaluate — Split by Time, Never Randomly
mae = (test - forecast).abs().mean()
mape = ((test - forecast).abs() / test).mean() * 100
# the baseline to beat: same day last week
snaive = daily.shift(7)[-28:]
mae_baseline = (test - snaive).abs().mean()
print(f"Holt-Winters MAE: {mae:,.0f} Seasonal naive MAE: {mae_baseline:,.0f}")
The test set must be the most recent block — a random split lets the model train on the future and produces scores you can never reproduce in real life. This is the time series version of data leakage.
fig, ax = plt.subplots(figsize=(12, 4))
train[-90:].plot(ax=ax, label='train')
test.plot(ax=ax, label='actual')
forecast.plot(ax=ax, linestyle='--', label='forecast')
ax.legend()
plt.tight_layout()
plt.show()
Level 3 — When You Need More
ARIMA/SARIMA (statsmodels) and Prophet handle multiple seasonality, holiday effects, and longer horizons — worth learning when forecasting becomes the job rather than a report section. The evaluation discipline (time-based split, beat the seasonal naive) carries over unchanged.
Anomaly Detection — Rolling Bands
Flag days that fall outside what recent history says is normal:
window = 28
roll_mean = daily.rolling(window).mean().shift(1) # shift(1): today is judged
roll_std = daily.rolling(window).std().shift(1) # by PAST days only
upper = roll_mean + 3 * roll_std
lower = roll_mean - 3 * roll_std
anomalies = daily[(daily > upper) | (daily < lower)]
The shift(1) matters: without it, today's own (possibly anomalous) value is inside the window judging it. ±3 SD flags roughly the genuinely surprising days; tighten to ±2 if you prefer sensitive alerts and can tolerate false alarms — the trade-off is Type I vs Type II error from Statistics Notebook. For metrics with a strong weekly cycle, run the bands on the decomposition residual instead of the raw series, so a normal busy Saturday doesn't alert.
Common Mistakes
1. Random Train/Test Split on Time Data
Shuffled splits leak the future into training. Split by time: train on the past, test on the most recent block. The same applies to cross-validation — use expanding-window CV (sklearn.model_selection.TimeSeriesSplit), never KFold(shuffle=True).
2. Zero-Filling "No Data"
resample('D').sum() makes missing days 0 — right for totals, wrong for averages and rates. Decide per metric; NaN that stays visible beats a silent wrong zero.
3. Comparing Partial Periods
Month-to-date vs last full month is the classic false alarm (the dashboard version of this: divide by days elapsed, not days in month). Compare equal, complete windows.
4. Ignoring the Weekly Cycle
"Revenue fell 30% vs yesterday" — yesterday was Saturday. Daily business metrics almost always need same-weekday comparison (lag-7) or a 7-day MA before any conclusion.
5. Forecasting Without a Baseline
A model with 12% MAPE sounds fine until the seasonal naive scores 11%. Always report the naive baseline next to the model.
6. Over-Smoothing
A 90-day MA on daily data erases the events you are paid to notice. Match the window to the question: 7 days to remove weekday noise, 28 for monthly trend — and keep the raw series on the plot.
7. Trusting Decomposition Near the Edges
The trend line from seasonal_decompose is a centered moving average — it is NaN (or unreliable) at the start and end of the series, exactly where you care most. For "what is the trend right now", use a trailing MA or the Holt-Winters level instead.
This pairs with Metrics & Diagnosis Guide — decomposition and rolling bands answer its Step 1 ("shape the drop in time") with code, and the seasonality profiles are what make its window comparisons fair.