-
Notifications
You must be signed in to change notification settings - Fork 7
Blog - line plot #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh-pages
Are you sure you want to change the base?
Blog - line plot #23
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
--- | ||
layout: post | ||
title: "Making a Line Plot" | ||
date: 2018-08-15 15:30:00 -0500 | ||
author: "Kimberly Orr and Nabarun Pal" | ||
categories: user-guide | ||
tags: "intro about line" | ||
excerpt_separator: <!--read more--> | ||
--- | ||
|
||
# Making a Line Plot | ||
Altair works best with [long-form](https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data) data. This is where each row contains a single observation along with all of its metadata stored as values. | ||
|
||
Matplotlib works a little better with wide-form data. | ||
|
||
Since mpl-altair converts from Altair to Matplotlib, let's look at how to create a line plot using Altair, Matplotlib, and mpl-altair with the following long-form data. | ||
|
||
```python | ||
import pandas as pd | ||
df = pd.DataFrame({ | ||
'set': [1, 2, 1, 2, 1, 2, 1, 2], | ||
'amount': [1, 4, 5, 3, 1, 7, 2, 9], | ||
'location': ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'] | ||
}) | ||
``` | ||
|
||
**set** | **amount** | **location** | ||
:---: | :---: | :---: | ||
1 | 1 | a | ||
2 | 4 | a | ||
1 | 5 | b | ||
2 | 3 | b | ||
1 | 1 | c | ||
2 | 7 | c | ||
1 | 2 | d | ||
2 | 9 | d | ||
|
||
A possible scenario for this dataset would be an experiment being run in several different locations with 2 measurements taken at each location. The goal with the visualization being to visualize how the amount changed between the two sets of measurements at each location. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The goal of the visualization |
||
|
||
## Altair | ||
If we want to plot lines to show how each location changed between set one and set two, | ||
we need to specify the data, tell Altair to plot lines with `mark_line()`, link the x | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe turn into a list:
|
||
encoding channel with 'set', the y channel with 'amount', and the color channel with 'location'. | ||
```python | ||
import altair as alt | ||
alt.Chart(df).mark_line().encode( | ||
alt.X('set'), | ||
alt.Y('amount'), | ||
alt.Color('location') | ||
) | ||
``` | ||
 | ||
|
||
## Matplotlib | ||
In Matplotlib, just like with a categorical scatter plot, we have to plot a new line for every location. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link to cat scatter post if you're gonna make a reference to it |
||
Specifying a label with each line allows us to generate a legend with `ax.legend()`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Label for each line |
||
```python | ||
import matplotlib.pyplot as plt | ||
fig, ax = plt.subplots() | ||
for loc, subset in df.groupby('location'): | ||
ax.plot('set', 'amount', data=subset, label=loc) | ||
ax.set_xlabel('set') | ||
ax.set_ylabel('amount') | ||
ax.legend(title='location') | ||
plt.grid() | ||
plt.show() | ||
``` | ||
 | ||
|
||
## mpl-altair | ||
To render the Altair chart using Matplotlib: | ||
```python | ||
import altair as alt | ||
import matplotlib.pyplot as plt | ||
import mplaltair | ||
chart = alt.Chart(df).mark_line().encode( | ||
alt.X('set'), | ||
alt.Y('amount'), | ||
alt.Color('location') | ||
) | ||
fig, ax = mplaltair.convert(chart) | ||
plt.show() | ||
``` | ||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rational behind this statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just my impression after converting so many Altair charts to Matplotlib by hand. It seems like the long-form data usually has to be reformatted into something that more resembles wide-form data for Matplotlib whenever things get slightly complicated (eg scatter plots that have categorical colors and line plots like the one in this post).
Is it incorrect to say that mpl works a little better with wide-form data? Would it be better to just leave that statement out?