diff --git a/doc/internals/how-to-create-custom-index.rst b/doc/internals/how-to-create-custom-index.rst index 93805229db1..90b3412c2cb 100644 --- a/doc/internals/how-to-create-custom-index.rst +++ b/doc/internals/how-to-create-custom-index.rst @@ -1,5 +1,7 @@ .. currentmodule:: xarray +.. _internals.custom indexes: + How to create a custom index ============================ diff --git a/doc/internals/index.rst b/doc/internals/index.rst index 46972ff69bd..b2a37900338 100644 --- a/doc/internals/index.rst +++ b/doc/internals/index.rst @@ -19,9 +19,10 @@ The pages in this section are intended for: :hidden: internal-design + interoperability duck-arrays-integration chunked-arrays extending-xarray - zarr-encoding-spec how-to-add-new-backend how-to-create-custom-index + zarr-encoding-spec diff --git a/doc/internals/internal-design.rst b/doc/internals/internal-design.rst index 11b4ee39da9..55ab2d79dbe 100644 --- a/doc/internals/internal-design.rst +++ b/doc/internals/internal-design.rst @@ -59,9 +59,9 @@ which is used as the basic building block behind xarray's - ``data``: The N-dimensional array (typically a NumPy or Dask array) storing the Variable's data. It must have the same number of dimensions as the length of ``dims``. -- ``attrs``: An ordered dictionary of metadata associated with this array. By +- ``attrs``: A dictionary of metadata associated with this array. By convention, xarray's built-in operations never use this metadata. -- ``encoding``: Another ordered dictionary used to store information about how +- ``encoding``: Another dictionary used to store information about how these variable's data is represented on disk. See :ref:`io.encoding` for more details. @@ -95,7 +95,7 @@ all of which are implemented by forwarding on to the underlying ``Variable`` obj In addition, a :py:class:`~xarray.DataArray` stores additional ``Variable`` objects stored in a dict under the private ``_coords`` attribute, each of which is referred to as a "Coordinate Variable". These coordinate variable objects are only allowed to have ``dims`` that are a subset of the data variable's ``dims``, -and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.sizes` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes. +and each dim has a specific length. This means that the full :py:attr:`~xarray.DataArray.size` of the dataarray can be represented by a dictionary mapping dimension names to integer sizes. The underlying data variable has this exact same size, and the attached coordinate variables have sizes which are some subset of the size of the data variable. Another way of saying this is that all coordinate variables must be "alignable" with the data variable. @@ -124,7 +124,7 @@ The :py:class:`~xarray.Dataset` class is a generalization of the :py:class:`~xar Internally all data variables and coordinate variables are stored under a single ``variables`` dict, and coordinates are specified by storing their names in a private ``_coord_names`` dict. -The dataset's dimensions are the set of all dims present across any variable, but (similar to in dataarrays) coordinate +The dataset's ``dims`` are the set of all dims present across any variable, but (similar to in dataarrays) coordinate variables cannot have a dimension that is not present on any data variable. When a data variable or coordinate variable is accessed, a new ``DataArray`` is again constructed from all compatible diff --git a/doc/internals/interoperability.rst b/doc/internals/interoperability.rst new file mode 100644 index 00000000000..cbd96362e35 --- /dev/null +++ b/doc/internals/interoperability.rst @@ -0,0 +1,45 @@ +.. _interoperability: + +Interoperability of Xarray +========================== + +Xarray is designed to be extremely interoperable, in many orthogonal ways. +Making xarray as flexible as possible is the common theme of most of the goals on our :ref:`roadmap`. + +This interoperability comes via a set of flexible abstractions into which the user can plug in. The current full list is: + +- :ref:`Custom file backends ` via the :py:class:`~xarray.backends.BackendEntrypoint` system, +- Numpy-like :ref:`"duck" array wrapping `, which supports the `Python Array API Standard `_, +- :ref:`Chunked distributed array computation ` via the :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` system, +- Custom :py:class:`~xarray.Index` objects for :ref:`flexible label-based lookups `, +- Extending xarray objects with domain-specific methods via :ref:`custom accessors `. + +.. warning:: + + One obvious way in which xarray could be more flexible is that whilst subclassing xarray objects is possible, we + currently don't support it in most transformations, instead recommending composition over inheritance. See the + :ref:`internal design page ` for the rationale and look at the corresponding `GH issue `_ + if you're interested in improving support for subclassing! + +.. note:: + + If you think there is another way in which xarray could become more generically flexible then please + tell us your ideas by `raising an issue to request the feature `_! + + +Whilst xarray was originally designed specifically to open ``netCDF4`` files as :py:class:`numpy.ndarray` objects labelled by :py:class:`pandas.Index` objects, +it is entirely possible today to: + +- lazily open an xarray object directly from a custom binary file format (e.g. using ``xarray.open_dataset(path, engine='my_custom_format')``, +- handle the data as any API-compliant numpy-like array type (e.g. sparse or GPU-backed), +- distribute out-of-core computation across that array type in parallel (e.g. via :ref:`dask`), +- track the physical units of the data through computations (e.g via `pint-xarray `_), +- query the data via custom index logic optimized for specific applications (e.g. an :py:class:`~xarray.Index` object backed by a KDTree structure), +- attach domain-specific logic via accessor methods (e.g. to understand geographic Coordinate Reference System metadata), +- organize hierarchical groups of xarray data in a :py:class:`~datatree.DataTree` (e.g. to treat heterogenous simulation and observational data together during analysis). + +All of these features can be provided simultaneously, using libaries compatible with the rest of the scientific python ecosystem. +In this situation xarray would be essentially a thin wrapper acting as pure-python framework, providing a common interface and +separation of concerns via various domain-agnostic abstractions. + +Most of the remaining pages in the documentation of xarray's internals describe these various types of interoperability in more detail. diff --git a/doc/whats-new.rst b/doc/whats-new.rst index 9f049b148c7..c88f685b0ba 100644 --- a/doc/whats-new.rst +++ b/doc/whats-new.rst @@ -137,6 +137,8 @@ Bug fixes Documentation ~~~~~~~~~~~~~ +- Added page on the interoperability of xarray objects. + (:pull:`7992`) By `Tom Nicholas `_. - Added xarray-regrid to the list of xarray related projects (:pull:`8272`). By `Bart Schilperoort `_.