Visualization Selection Guide

2026-04-25 Data Analysis 8 min read

Python Visualization

Picking the right chart is a different skill from knowing how to draw one. This guide is about the when — which chart fits your data and your question — and the mistakes you make when you pick wrong.

For the syntax and code, see Matplotlib / Seaborn Notebook. All examples use the same parking dataset.

Chart Selection Table

Use this table to find the right chart before you read the full section.

Your data and question	Chart
How does a metric change over time?	1. Line Chart
How do categories compare on one metric?	2. Bar Chart
What share does each category have of the total?	3. Stacked Bar / Pie
What does the spread of one variable look like?	4. Histogram
How does a spread differ across groups?	5. Box Plot
Is there a link between two numeric variables?	6. Scatter Plot
How do values change across a 2-D grid?	7. Heatmap
How do several metrics move together over time?	Combine: Line + dual axis
How does one station compare on several measures at once?	Combine: Small multiples

When no single chart tells the whole story, see Dashboard Composition at the end.

1. Line Chart — Trend Over Time

Use when: your x-axis is time and you want to show how a metric changes.

ax.plot(monthly['month_str'], monthly['revenue'], marker='o')

Avoid when: - You have fewer than 4 time points — use a bar chart. A line suggests a smooth flow that does not exist. - The x-axis is a category with no natural order (station names, payment methods) — a bar chart is clearer.

What to watch for

Add a rolling average on top when the raw line is too noisy to read:

monthly['rolling_3m'] = monthly['revenue'].rolling(3, center=True).mean()
ax.plot(monthly['month_str'], monthly['rolling_3m'], linestyle='--', label='3M Avg')

The rolling average should sit next to the raw line, not replace it. Show both.

Common mistake: a cut-off y-axis

If the y-axis does not start at zero, small changes look much bigger than they are. Always start at 0, unless the value range is large compared to the absolute value (for example, stock prices).

ax.set_ylim(bottom=0)   # force a zero baseline

2. Bar Chart — Category Comparison

Use when: you want to compare a number across separate categories (stations, payment methods, parking types).

sns.barplot(data=station_rev, x='station_code', y='amount', ax=ax)

Avoid when: - You have more than ~12 categories — the chart gets unreadable. Filter to top N, or switch to a table. - You want to show change over time across many time points — use a line chart.

Vertical vs. Horizontal

Use horizontal bars when category labels are long. They are easier to read than rotated x-axis labels.

sns.barplot(data=station_rev, y='station_code', x='amount', orient='h', ax=ax)

Sort your bars

Unsorted bars make comparison harder. Always sort by the metric, unless the category has a natural order (months, age groups).

station_rev = station_rev.sort_values('amount', ascending=False)

Add value labels for precision

Bar height gives you a rough comparison. For exact numbers in a report, add value labels:

for container in ax.containers:
    ax.bar_label(container, fmt='%.0f', padding=3)

3. Stacked Bar & Pie — Composition

Use these when you care about what share each part has of the whole.

Stacked Bar — Composition That Changes Across Categories

Use when: you want to show both the total and the breakdown by sub-category, across several groups.

pivot = parking_df.pivot_table(
    values='amount', index='station_code',
    columns='payment_method', aggfunc='sum', fill_value=0
)
pivot.plot(kind='bar', stacked=True)

Avoid when: you need to compare one specific sub-category across groups. The floating baseline makes it hard to judge anything except the bottom segment and the total.

If you need that kind of per-segment comparison, use a grouped bar chart or several small charts instead.

Pie / Donut Chart

Use when: - There are 5 or fewer categories. - You want to highlight one big segment ("Station A is 60% of revenue"). - The audience cares about share of the whole, not exact values.

Avoid when: - You have more than 5 categories — too many slices become unreadable. - You need to compare values — people are bad at judging angles. A bar chart is almost always more accurate. - You need to show change over time.

Situation	Stacked Bar	Pie
Several groups with sub-categories	✓	✗
One snapshot of the breakdown	✓	✓
≤ 5 categories	✓	✓
Comparing exact values	✗	✗

4. Histogram — One Variable's Distribution

Use when: you want to see how one numeric variable is spread out — its shape, center, range, and whether there are outliers (extreme values).

sns.histplot(parking_df['amount'], bins=30, kde=True, ax=ax)

Avoid when: you want to compare two groups with very different sizes — the raw count scale is misleading. Use stat='density' or stat='probability' to normalize.

sns.histplot(data=parking_df, x='amount', hue='parking_type',
             stat='density', common_norm=False, bins=30, kde=True)

common_norm=False normalizes each group on its own, so the shapes are comparable no matter the group size.

Choosing the bin count

Too few bins hide the shape. Too many add noise. bins=30 is a fair default for most business data. If the chart shows two clear humps (bimodal), check whether you should split the data by a category first.

What to look for

Right skew (long tail to the right): most parking amounts are small, with a few large ones — the median is more meaningful than the mean.
Outliers: gaps in the tail can mean data quality issues or a separate sub-group.
Bimodal: two peaks often mean two different behaviors mixed together (for example, short-term parkers vs. monthly parkers).

5. Box Plot — Distribution Across Groups

Use when: you want to compare how one numeric variable is spread across 3 or more categories — and see the median, the spread, and outliers at the same time.

sns.boxplot(data=parking_df, x='parking_type', y='amount', ax=ax)

Avoid when: - You have only 2 groups — a histogram with hue or a simple comparison of means is clearer. - Your audience is non-technical — box plots need explanation (what is an IQR?). A bar chart of means with error bars is easier to grasp.

Reading a box plot

Box = interquartile range (IQR), the middle 50% of the data.
Line inside the box = median.
Whiskers = 1.5× IQR from the box edges.
Dots beyond the whiskers = outliers.

The median tells you more than the mean when the data is skewed. A box plot makes that visible.

Show individual points for small datasets

When n < 50 per group, the box summary can mislead. Show the points themselves:

sns.boxplot(data=df, x='parking_type', y='amount', ax=ax)
sns.stripplot(data=df, x='parking_type', y='amount',
              color='black', alpha=0.4, size=2, ax=ax)

6. Scatter Plot — Link Between Two Numeric Variables

Use when: you want to check whether two numeric variables move together — for example, does parking duration predict the payment amount?

sns.scatterplot(data=parking_df, x='duration_mins', y='amount',
                hue='parking_type', alpha=0.5)

Avoid when: - One axis is a category — use a box plot or bar chart. - You have millions of points — they overlap and hide the pattern. Subsample, use hexbin, or a 2D density plot instead.

Add a regression line to confirm the direction

sns.regplot(data=parking_df, x='duration_mins', y='amount',
            scatter_kws={'alpha': 0.3})

The shaded band is the confidence interval. A wide band means the link is weak or the sample is small.

Correlation is not causation

A scatter plot shows that two things move together. It does not show that one causes the other. If duration and amount are linked, that is mostly because pricing is time-based — not a discovery.

Common mistake: ignoring outliers

A few outliers can hide a real pattern, or create one that is not really there. Always check whether the pattern still holds after you remove extreme values.

7. Heatmap — Values Across a 2-D Grid

Use when: you have a matrix of values — a correlation table, or a pivot table with two category dimensions.

# correlation heatmap
sns.heatmap(corr, annot=True, fmt='.2f', cmap='coolwarm', vmin=-1, vmax=1)

# pivot heatmap: revenue by station × month
sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlOrRd')

Avoid when: - You have fewer than ~4 rows or columns — a table or grouped bar chart is cleaner. - You need to compare exact values — color is harder to read precisely than bar length. Add annot=True to print the numbers if precision matters.

Choosing a color palette

Palette	Use case
`'coolwarm'`	Diverging values (correlation: -1 to +1, growth: negative to positive)
`'YlOrRd'`	One direction (revenue, volume — higher = more intense)
`'Blues'`	One color, less visually loud

Always set vmin and vmax for diverging palettes so the midpoint (zero or neutral) maps to white the same way every time:

sns.heatmap(corr, cmap='coolwarm', vmin=-1, vmax=1)

Pivot heatmap vs. line chart for trend data

A pivot heatmap (station × month) lets you scan two dimensions at once — which stations are growing and which months are peaks. A line chart per station is better when you care about the exact shape of the trend. Use both: the heatmap for the overview, line charts for the follow-up.

Common Mistakes

1. Using a Pie Chart with Too Many Slices

# AVOID: 8 slices — the angles look the same
df['payment_method'].value_counts().plot(kind='pie')

# BETTER: bar chart, or merge the small categories into "Other"
top5 = df['payment_method'].value_counts().head(5)
other = df['payment_method'].value_counts().iloc[5:].sum()
top5['Other'] = other
top5.plot(kind='pie')

2. Skipping the Zero Baseline on Bar Charts

# AVOID: y-axis starts at 80000, so a small difference looks huge
ax.set_ylim(80000, 120000)

# CORRECT: bar charts should always start at 0
ax.set_ylim(bottom=0)

Line charts can use a non-zero baseline when the change is small compared to the absolute value, but bar charts never should — bar length stands for the value itself.

3. Plotting Means Without Showing Spread

A bar chart of means hides whether the groups have similar distributions or very different ones.

# Shows only the mean — misleading if the spreads differ
sns.barplot(data=df, x='parking_type', y='amount')

# Better: box plot, or bar + stripplot on top
sns.boxplot(data=df, x='parking_type', y='amount')

If you have to use a bar chart of means, at least add error bars:

sns.barplot(data=df, x='parking_type', y='amount', errorbar='sd')

4. Using Color to Encode the Same Variable Twice

# AVOID: x-axis already shows station_code — color adds nothing
sns.barplot(data=station_rev, x='station_code', y='amount',
            hue='station_code')   # repeats the same info

# Use hue only when it shows a DIFFERENT variable
sns.barplot(data=df, x='station_code', y='amount',
            hue='parking_type')   # hue = a second dimension

5. Not Labeling Axes or Units

# Always set axis labels with units
ax.set_xlabel('Month')
ax.set_ylabel('Revenue (NTD)')
ax.set_title('Monthly Revenue by Station')

A chart with no units is incomplete. "Revenue" alone does not say NTD, USD, or thousands.

6. Comparing Groups of Very Different Sizes Using Raw Counts

# AVOID: Station A has 10x more sessions — comparing raw counts means nothing
sns.barplot(data=df, x='station_code', y='parking_id', estimator='count')

# BETTER: switch to per-session averages
sns.barplot(data=df, x='station_code', y='amount', estimator='mean')

# OR: show both total and average side by side

Dashboard Composition

Real reporting tasks need several charts working together. The rule: each chart answers one question; together they tell one story.

Principle: Overview → Detail

Design the dashboard so the reader moves from a high-level summary to the details.

[Top row]   KPI metrics (total revenue, total visits, avg amount)
[Middle]    Trend over time (line chart — what is happening?)
[Bottom]    Breakdown (bar chart by station, heatmap by station × month)

Example: Monthly Station Report

fig = plt.figure(figsize=(18, 12))

# top row: trend + MoM change
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)

# bottom row: station breakdown + heatmap
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)

# top-left: monthly revenue + rolling avg
ax1.plot(monthly['month_str'], monthly['revenue'], marker='o', label='Monthly')
ax1.plot(monthly['month_str'], monthly['rolling_3m'], linestyle='--', label='3M Avg')
ax1.set_title('Monthly Revenue')
ax1.legend()
ax1.tick_params(axis='x', rotation=45)

# top-right: MoM % change (green/red bars)
colors = ['green' if v >= 0 else 'tomato' for v in monthly['mom_pct'].fillna(0)]
ax2.bar(monthly['month_str'], monthly['mom_pct'].fillna(0), color=colors)
ax2.axhline(0, color='black', linewidth=0.8)
ax2.set_title('Month-over-Month Change (%)')
ax2.tick_params(axis='x', rotation=45)

# bottom-left: revenue by station
sns.barplot(data=station_rev, x='station_code', y='amount', ax=ax3)
ax3.set_title('Total Revenue by Station')
for c in ax3.containers:
    ax3.bar_label(c, fmt='%.0f', padding=3, fontsize=8)

# bottom-right: station × month heatmap
pivot = parking_df.pivot_table(
    values='amount', index='station_code',
    columns=parking_df['entry_time'].dt.month, aggfunc='sum'
)
sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlOrRd', ax=ax4)
ax4.set_title('Revenue by Station × Month')
ax4.set_xlabel('Month')

plt.suptitle('Parking System — Monthly Report', fontsize=16, y=1.01)
plt.tight_layout()
plt.savefig('monthly_report.png', dpi=150, bbox_inches='tight')
plt.show()

How to read the composition: - Top-left tells you what the trend is. Top-right tells you how fast it is changing. - Bottom-left ranks stations by total. Bottom-right shows which station had which month — the question the bar chart on its own cannot answer. - The line chart and the heatmap fit together: one shows the shape, the other shows the size across two dimensions at the same time.

← Back to Blog