Skip to content

ENH: plotting function for "2D histograms"/"dynamic spectra" (similar to heatmap) #33560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
johan12345 opened this issue Apr 15, 2020 · 3 comments

Comments

@johan12345
Copy link
Contributor

johan12345 commented Apr 15, 2020

Is your feature request related to a problem?

I often need to plot a heatmap of a DataFrame which uses an IntervalIndex as its columns (and, usually, time as its index). Such a plot could also be called a "dynamic spectrum" or "2D histogram" and is used to quickly get an idea of how a spectrum develops over time.

This is slightly different from what is usually considered as a heatmap (see #19008 for an example) as the bins are not necessarily equidistant and there is not necessarily a separate label for each bin. The y axis (which is used for the IntervalIndex) could even have logarithmic scaling.

Describe the solution you'd like

This could use the same API df.plot(type='heatmap') as suggested in #19008 and switch between appropriate axis scaling/labeling modes depending on whether a CategoricalIndex, IntervalIndex or other types of indices are used.

Describe alternatives you've considered

My current implementation (see below) uses matplotlib's pcolormesh, but needs to do some fiddling with the bin edges to work correctly.

Matplotlib's hist2d does not work for this use case, because the data is already stored in histogrammed form - the histogram and its bins don't need to be calculated, just plotted.

Seaborn's heatmap function seems to be limited to plotting categorical data, so both IntervalIndex and DatetimeIndex are displayed as categorical data with one label per bin, equidistant spacing, and values on the y axis sorted from top to bottom instead of bottom to top:

Additional context

My current implementation looks similar to this:

binedges = np.append(df.columns.left, df.columns.right[-1])
X, Y = np.meshgrid(df.index, binedges)
pcm = ax.pcolormesh(X, Y, df.values.T)

# then add labels, colorbar etc.

This only works if the IntervalIndex has no gaps and is non-overlapping, which would have to be checked first.

@johan12345 johan12345 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 15, 2020
@Rik-de-Kort
Copy link
Contributor

Rik-de-Kort commented Apr 19, 2020

I'm not sure this is a satisfactory solution, but I wanted to share it anyway since it does solve your problem in a neat way. However, it requires you to use a different library for visualization so I appreciate if that's not workable for you.
I'm using Altair but I imagine the ggplot package can do something similar. Here's a pic. Note that it can handle missing data, without distorting the axes!
canvas

And here is the code:

import numpy as np
import pandas as pd
import altair as alt

# Generate some data
df = pd.DataFrame(np.random.rand(10, 10), index=pd.date_range("2020-04-19", periods=10, freq="D"), columns=pd.interval_range(start=0, end=1, periods=10))
df = df.drop(index=df.index[4], columns=df.columns[6])

# Reshape frame for visualization.
# We cast intervals to simple tuples of their endpoints,
# "melt" the dataframe and unpack the tuples so we
# end up with a frame of the form [timestamp, value, left_endpoint, right_endpoint]
tidy = df
tidy.columns = [(interval.left, interval.right) for interval in tidy.columns]
tidy = tidy.reset_index() # Reset index necessary because pd.melt drops index, see #17440
tidy = tidy.melt(id_vars="index", var_name="energy") 
tidy[["energy0", "energy1"]] = pd.DataFrame(tidy.energy.tolist())

# Visualize.
alt.Chart(tidy).mark_rect().encode(x="monthdate(index)", y="energy0", y2="energy1", color="value")

@DeeDiveT
Copy link

@Rik-de-Kort This solution looks awesome! However, when I try to run your code, there is nothing shown on my screen. Do you know how to solve that? Thanks

@Rik-de-Kort
Copy link
Contributor

@Rik-de-Kort This solution looks awesome! However, when I try to run your code, there is nothing shown on my screen. Do you know how to solve that? Thanks

Ah yes, add .serve() to the end of the chart, that will start a renderer. I didn't include it in case the user was in a notebook.

@jbrockmendel jbrockmendel added Visualization plotting and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants