Skip to content

Blog - line plot #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions _posts/altair-to-mpl-line.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
layout: post
title: "Making a Line Plot"
date: 2018-08-15 15:30:00 -0500
author: "Kimberly Orr and Nabarun Pal"
categories: user-guide
tags: "intro about line"
excerpt_separator: <!--read more-->
---

# Making a Line Plot
Altair works best with [long-form](https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data) data. This is where each row contains a single observation along with all of its metadata stored as values.

Matplotlib works a little better with wide-form data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rational behind this statement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just my impression after converting so many Altair charts to Matplotlib by hand. It seems like the long-form data usually has to be reformatted into something that more resembles wide-form data for Matplotlib whenever things get slightly complicated (eg scatter plots that have categorical colors and line plots like the one in this post).

Is it incorrect to say that mpl works a little better with wide-form data? Would it be better to just leave that statement out?


Since mpl-altair converts from Altair to Matplotlib, let's look at how to create a line plot using Altair, Matplotlib, and mpl-altair with the following long-form data.

```python
import pandas as pd
df = pd.DataFrame({
'set': [1, 2, 1, 2, 1, 2, 1, 2],
'amount': [1, 4, 5, 3, 1, 7, 2, 9],
'location': ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd']
})
```

**set** | **amount** | **location**
:---: | :---: | :---:
1 | 1 | a
2 | 4 | a
1 | 5 | b
2 | 3 | b
1 | 1 | c
2 | 7 | c
1 | 2 | d
2 | 9 | d

A possible scenario for this dataset would be an experiment being run in several different locations with 2 measurements taken at each location. The goal with the visualization being to visualize how the amount changed between the two sets of measurements at each location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the visualization


## Altair
If we want to plot lines to show how each location changed between set one and set two,
we need to specify the data, tell Altair to plot lines with `mark_line()`, link the x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe turn into a list:

  1. Specificy the data: 'alt.chart(df)
  2. Tell Altair to plot lines with 'mark_lines
  3. Link the encoding channels:
  • X with 'set'
  • y with 'amount'
    Etc

encoding channel with 'set', the y channel with 'amount', and the color channel with 'location'.
```python
import altair as alt
alt.Chart(df).mark_line().encode(
alt.X('set'),
alt.Y('amount'),
alt.Color('location')
)
```
![png](pics/altair-to-mpl-line_0.png)

## Matplotlib
In Matplotlib, just like with a categorical scatter plot, we have to plot a new line for every location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to cat scatter post if you're gonna make a reference to it

Specifying a label with each line allows us to generate a legend with `ax.legend()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label for each line

```python
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
for loc, subset in df.groupby('location'):
ax.plot('set', 'amount', data=subset, label=loc)
ax.set_xlabel('set')
ax.set_ylabel('amount')
ax.legend(title='location')
plt.grid()
plt.show()
```
![png](pics/altair-to-mpl-line_1.png)

## mpl-altair
To render the Altair chart using Matplotlib:
```python
import altair as alt
import matplotlib.pyplot as plt
import mplaltair
chart = alt.Chart(df).mark_line().encode(
alt.X('set'),
alt.Y('amount'),
alt.Color('location')
)
fig, ax = mplaltair.convert(chart)
plt.show()
```
![png](pics/altair-to-mpl-line_2.png)
Binary file added _posts/pics/altair-to-mpl-line_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _posts/pics/altair-to-mpl-line_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _posts/pics/altair-to-mpl-line_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.