Skip to content

Blog - complex scatter scatter plot #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions _posts/2018-08-15-altair-to-mpl-scatter-part2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
layout: post
title: "Making a Scatter Plot - Part 2"
date: 2018-08-15 17:00:00 -0500
author: "Kimberly Orr", "Nabarun Pal"
categories: user-guide
tags: "intro about scatter"
excerpt_separator: <!--read more-->
---

# Making a Complex Scatter Plot
At the time of writing, mpl-altair does not support scatter plots with nominal or ordinal color encodings, so this post will show how to create a complex scatter plot in Altair, Matplotlib, and how mpl-altair _should_ implement the chart conversion in the future.

In the [first part]({{ site.baseurl }}{% link _posts/2018-08-15-altair-to-mpl-scatter-part1.md %}), we made a simple scatter plot. This post will look at a more complex plot.
We'll use the cars dataset again:
```python
from vega_datasets import data
cars = data.cars()
cars.head()
```
**Acceleration** | **Cylinders** | **Displacement** | **Horsepower** | **Miles_per_Gallon** | **Name** | **Origin** | **Weight_in_lbs** | **Year**
:---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---:
12.0 | 8 | 307.0 | 130.0 | 18.0 | chevrolet chevelle malibu | USA | 3504 | 1970-01-01
11.5 | 8 | 350.0 | 165.0 | 15.0 | buick skylark 320 | USA | 3693 | 1970-01-01
11.0 | 8 | 318.0 | 150.0 | 18.0 | plymouth satellite | USA | 3436 | 1970-01-01
12.0 | 8 | 304.0 | 150.0 | 16.0 | amc rebel sst | USA | 3433 | 1970-01-01
10.5 | 8 | 302.0 | 140.0 | 17.0 | ford torino | USA | 3449 | 1970-01-01

In addition to looking at Horsepower vs Weight, let's color each point by its origin country.

## Altair
Since Altair is based on linking columns to encodings, we have to specify
that the color encoding comes from the _Origin_ column.

Also, notice that a legend is automatically generated.
```python
import altair as alt
alt.Chart(cars).mark_point().encode(
alt.X('Weight_in_lbs'),
alt.Y('Horsepower'),
alt.Color('Origin')
)
```
![png](pics/altair-to-mpl-scatter-part2_0.png)

## Matplotlib
If we wanted to color our points by a quantitative variable, we could have just
used
```python
ax.scatter('Weight_in_lbs', 'Horsepower', c='quantitative_column', data=cars)
```
However, the scatter function currently doesn't allow categorical color definitions for categorical data.
So, we have to plot points from each country as separate scatter plots on the same axes object.

There are a couple options for this. One is to create subsets of your data via `df['col'].unique()`. Another is to use the `df.groupby('col')` function to create the subset within the for loop.

Specifying the label as a parameter in scatter allows us to call `ax.legend()` to generate a legend.
```python
import matplotlib.pyplot as plt
```
```python
# Option 1: df['col'].unique()
fig, ax = plt.subplots()
for car in cars['Origin'].unique():
d_ = cars[cars['Origin'] == car]
ax.scatter(data=d_, x='Weight_in_lbs', y='Horsepower', label=car)
ax.set_xlabel('Weight_in_lbs')
ax.set_ylabel('Horsepower')
ax.set_xlim([0, None])
ax.set_ylim([0, None])
ax.legend(title='Origin')
ax.grid()
fig.show()
```
![png](pics/altair-to-mpl-scatter-part2_1.png)

```python
# Option 2: df.groupby()
fig, ax = plt.subplots()
for car_origin, cars_subset in cars.groupby(['Origin']):
ax.scatter(data=cars_subset, x='Weight_in_lbs', y='Horsepower', label=car_origin)
ax.set_xlabel('Weight_in_lbs')
ax.set_ylabel('Horsepower')
ax.set_xlim([0, None])
ax.set_ylim([0, None])
ax.legend(title='Origin')
ax.grid()
fig.show()
```
![png](pics/altair-to-mpl-scatter-part2_2.png)

## mpl-altair
At the time of writing, mpl-altair doesn't support scatter plots with categorical colors.

If mpl-altair supported categorical colors with scatter plots, this is how an Altair chart would get rendered in mpl-altair:
```python
import altair as alt
import matplotlib.pyplot as plt
import mplaltair
chart = alt.Chart(cars).mark_point().encode(
alt.X('Weight_in_lbs'),
alt.Y('Horsepower'),
alt.Color('Origin')
)
fig, ax = mplaltair.convert(chart)
plt.show()
```
Binary file added _posts/pics/altair-to-mpl-scatter-part2_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _posts/pics/altair-to-mpl-scatter-part2_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _posts/pics/altair-to-mpl-scatter-part2_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.