Skip to content

DRAFT: Plotting (and later fitting) Protocol #423

@henryiii

Description

@henryiii

I'm working on defining a Protocol for plotting histograms. To do so, I need some way to access the values and optionally variances so that mplhep and others can decide on what to plot. Here is one possible suggestion:

from numpy.typing import ArrayLike # requires NumPy master
from typing import Protocol, Optional, Tuple, Union, Iterable

class PlottableAxis(Protocol):
    label: str # May be removed soon

    edges: ArrayLike

     # For non-categorical axes, this returns None
    categories: Union[Iterable[int], Iterable[str], None]

class PlottableHistogram(Protocol):
    axes : Tuple[PlottableAxis]
   
    # Values returns the array or the values array for specialized accumulators
    def values(self, flow=False) -> ArrayLike: ...

    # Variance returns the variance if applicable, otherwise None
    # If counts is none, variance returns NaN for that cell (mean storages)
    def variances(self, flow=False) -> Optional[ArrayLike]: ...

We can look at labeling in a later draft, but there are a few key points:

  • .values() is like view() for simple storages, and like .view().value for complex ones, providing plotting libraries a consistent way to access the central values for plots.
  • .variances() returns None if a storage does not provide information for variances, allowing a plotting library to use if variances := h.variances(): to skip plotting error bars, or do something reasonable. If it exists for that storage, it returns an array.
  • .axes[I].categories is an array of labels if this is a Category storage, otherwise it is None.

I didn't add .counts(), but maybe that should be included too?

@HDembinski, what do you think?

This is assuming the master object is the API. You could also have a __histogram_dict__ that returns something like what you see above - then it's much easier to add to existing libraries like Physt and doesn't affect/change the public API, but on the flip side, it's duplicated, all or nothing, and doesn't provide API consistency for users. (Double underscores for clarity - you really aren't supposed to add anything with double underscores, so for an API choice, it might not be ideal).

PS: This is post 0.10.0, possibly post 1.0, so no rush here, feel free to come up with something better.

PPS: Initially, boost-histogram and Uproot4 would implement this, and Hist would get it for free since it is a boost-histogram.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions