Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
2af21ab
fix: be more more caution when claiming a backend can open a URL
ianhi Sep 30, 2025
1a3e7df
add whats new entry
ianhi Sep 30, 2025
d6a47b7
fixes from review
ianhi Sep 30, 2025
7ed1f0a
more caution in scipy netcdf backend
ianhi Oct 1, 2025
60c1158
correct suffix detection for scipy backend
ianhi Oct 1, 2025
d2334e4
stricter URL detection for netcdf/dap
ianhi Oct 3, 2025
ef3e07c
no query params for h5netcdf
ianhi Oct 3, 2025
c07e7ea
scipy no urls
ianhi Oct 3, 2025
017713b
Merge branch 'main' into fix-netcdf4-remote-zarr-detection
ianhi Oct 3, 2025
9cf669b
don't try to read magic numbers for remote uris
ianhi Oct 6, 2025
e0e2da2
Merge branch 'main' into fix-netcdf4-remote-zarr-detection
ianhi Oct 6, 2025
bfefb21
Merge branch 'main' into fix-netcdf4-remote-zarr-detection
ianhi Oct 8, 2025
a50b2f6
review comments
ianhi Oct 8, 2025
10d6edd
fix windows failures
ianhi Oct 8, 2025
8c77986
docs on backend resolution
ianhi Oct 8, 2025
079b290
more complete table
ianhi Oct 8, 2025
6ee2910
no horizontal scroll on table
ianhi Oct 8, 2025
418ceee
Merge branch 'main' into fix-netcdf4-remote-zarr-detection
ianhi Oct 8, 2025
e32e93a
fix whats new header
ianhi Oct 8, 2025
f445045
correct description
ianhi Oct 8, 2025
4a717e7
case insensitivity to DAP: vs dap:
ianhi Oct 8, 2025
7dc0995
Merge branch 'main' into fix-netcdf4-remote-zarr-detection
ianhi Oct 14, 2025
00d07ee
thredds
ianhi Oct 14, 2025
f29d7d8
move import
ianhi Oct 14, 2025
d98d6dd
claude import rules
ianhi Oct 14, 2025
8f150dd
has_pydap instead of requires pydap
ianhi Oct 14, 2025
cc64d7c
Merge branch 'main' into fix-netcdf4-remote-zarr-detection
ianhi Oct 15, 2025
8123beb
Merge branch 'main' into fix-netcdf4-remote-zarr-detection
ianhi Oct 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,18 @@ pre-commit run --all-files # Includes ruff and other checks
uv run dmypy run # Type checking with mypy
```

## Code Style Guidelines

### Import Organization

- **Always place imports at the top of the file** in the standard import section
- Never add imports inside functions or nested scopes unless there's a specific
reason (e.g., circular import avoidance, optional dependencies in TYPE_CHECKING)
- Group imports following PEP 8 conventions:
1. Standard library imports
2. Related third-party imports
3. Local application/library specific imports

## GitHub Interaction Guidelines

- **NEVER impersonate the user on GitHub**, always sign off with something like
Expand Down
178 changes: 178 additions & 0 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,182 @@ You can learn more about using and developing backends in the
linkStyle default font-size:18pt,stroke-width:4


.. _io.backend_resolution:

Backend Selection
-----------------

When opening a file or URL without explicitly specifying the ``engine`` parameter,
xarray automatically selects an appropriate backend based on the file path or URL.
The backends are tried in order: **netcdf4 → h5netcdf → scipy → pydap → zarr**.

.. note::
You can customize the order in which netCDF backends are tried using the
``netcdf_engine_order`` option in :py:func:`~xarray.set_options`:

.. code-block:: python

# Prefer h5netcdf over netcdf4
xr.set_options(netcdf_engine_order=['h5netcdf', 'netcdf4', 'scipy'])

See :ref:`options` for more details on configuration options.

The following tables show which backend will be selected for different types of URLs and files.

.. important::
✅ means the backend will **guess it can open** the URL or file based on its path, extension,
or magic number, but this doesn't guarantee success. For example, not all Zarr stores are
xarray-compatible.

❌ means the backend will not attempt to open it.

Remote URL Resolution
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 50 10 10 10 10 10

* - URL
- :ref:`netcdf4 <io.netcdf>`
- :ref:`h5netcdf <io.hdf5>`
- :ref:`scipy <io.netcdf>`
- :ref:`pydap <io.opendap>`
- :ref:`zarr <io.zarr>`
* - ``https://example.com/store.zarr``
- ❌
- ❌
- ❌
- ❌
- ✅
* - ``https://example.com/data.nc``
- ✅
- ✅
- ❌
- ❌
- ❌
* - ``http://example.com/data.nc?var=temp``
- ✅
- ❌
- ❌
- ❌
- ❌
* - ``http://example.com/dap4/data.nc?var=x``
- ✅
- ❌
- ❌
- ✅
- ❌
* - ``dap2://opendap.nasa.gov/dataset``
- ❌
- ❌
- ❌
- ✅
- ❌
* - ``https://example.com/DAP4/data``
- ❌
- ❌
- ❌
- ✅
- ❌
* - ``http://test.opendap.org/dap4/file.nc4``
- ✅
- ✅
- ❌
- ✅
- ❌
* - ``https://example.com/DAP4/data.nc``
- ✅
- ✅
- ❌
- ✅
- ❌

Local File Resolution
~~~~~~~~~~~~~~~~~~~~~

For local files, backends first try to read the file's **magic number** (first few bytes).
If the magic number **cannot be read** (e.g., file doesn't exist, no permissions), they fall
back to checking the file **extension**. If the magic number is readable but invalid, the
backend returns False (does not fall back to extension).

.. list-table::
:header-rows: 1
:widths: 40 20 10 10 10 10

* - File Path
- Magic Number
- :ref:`netcdf4 <io.netcdf>`
- :ref:`h5netcdf <io.hdf5>`
- :ref:`scipy <io.netcdf>`
- :ref:`zarr <io.zarr>`
* - ``/path/to/file.nc``
- ``CDF\x01`` (netCDF3)
- ✅
- ❌
- ✅
- ❌
* - ``/path/to/file.nc4``
- ``\x89HDF\r\n\x1a\n`` (HDF5/netCDF4)
- ✅
- ✅
- ❌
- ❌
* - ``/path/to/file.nc.gz``
- ``\x1f\x8b`` + ``CDF`` inside
- ❌
- ❌
- ✅
- ❌
* - ``/path/to/store.zarr/``
- (directory)
- ❌
- ❌
- ❌
- ✅
* - ``/path/to/file.nc``
- *(no magic number)*
- ✅
- ✅
- ✅
- ❌
* - ``/path/to/file.xyz``
- ``CDF\x01`` (netCDF3)
- ✅
- ❌
- ✅
- ❌
* - ``/path/to/file.xyz``
- ``\x89HDF\r\n\x1a\n`` (HDF5/netCDF4)
- ✅
- ✅
- ❌
- ❌
* - ``/path/to/file.xyz``
- *(no magic number)*
- ❌
- ❌
- ❌
- ❌

.. note::
Remote URLs ending in ``.nc`` are **ambiguous**:

- They could be netCDF files stored on a remote HTTP server (readable by ``netcdf4`` or ``h5netcdf``)
- They could be OPeNDAP/DAP endpoints (readable by ``netcdf4`` with DAP support or ``pydap``)

These interpretations are fundamentally incompatible. If xarray's automatic
selection chooses the wrong backend, you must explicitly specify the ``engine`` parameter:

.. code-block:: python

# Force interpretation as a DAP endpoint
ds = xr.open_dataset("http://example.com/data.nc", engine="pydap")

# Force interpretation as a remote netCDF file
ds = xr.open_dataset("https://example.com/data.nc", engine="netcdf4")


.. _io.netcdf:

netCDF
Expand Down Expand Up @@ -1213,6 +1389,8 @@ See for example : `ncdata usage examples`_
.. _Ncdata: https://ncdata.readthedocs.io/en/latest/index.html
.. _ncdata usage examples: https://github.com/pp-mo/ncdata/tree/v0.1.2?tab=readme-ov-file#correct-a-miscoded-attribute-in-iris-input

.. _io.opendap:

OPeNDAP
-------

Expand Down
2 changes: 1 addition & 1 deletion doc/user-guide/options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Xarray offers a small number of configuration options through :py:func:`set_opti

2. Control behaviour during operations: ``arithmetic_join``, ``keep_attrs``, ``use_bottleneck``.
3. Control colormaps for plots:``cmap_divergent``, ``cmap_sequential``.
4. Aspects of file reading: ``file_cache_maxsize``, ``warn_on_unclosed_files``.
4. Aspects of file reading: ``file_cache_maxsize``, ``netcdf_engine_order``, ``warn_on_unclosed_files``.


You can set these options either globally
Expand Down
8 changes: 7 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

.. _whats-new:


What's New
==========

Expand Down Expand Up @@ -32,6 +33,11 @@ Bug Fixes
- Fix h5netcdf backend for format=None, use same rule as netcdf4 backend (:pull:`10859`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_

- ``netcdf4`` and ``pydap`` backends now use stricter URL detection to avoid incorrectly claiming
remote URLs. The ``pydap`` backend now only claims URLs with explicit DAP protocol indicators
(``dap2://`` or ``dap4://`` schemes, or ``/dap2/`` or ``/dap4/`` in the URL path). This prevents
both backends from claiming remote Zarr stores and other non-DAP URLs without an explicit
``engine=`` argument. (:pull:`10804`). By `Ian Hunt-Isaak <https://github.com/ianhi>`_.

Documentation
~~~~~~~~~~~~~
Expand Down Expand Up @@ -67,12 +73,12 @@ New features

Bug fixes
~~~~~~~~~

- Fix error raised when writing scalar variables to Zarr with ``region={}``
(:pull:`10796`).
By `Stephan Hoyer <https://github.com/shoyer>`_.



.. _whats-new.2025.09.1:

v2025.09.1 (September 29, 2025)
Expand Down
12 changes: 9 additions & 3 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -494,10 +494,16 @@ class H5netcdfBackendEntrypoint(BackendEntrypoint):
supports_groups = True

def guess_can_open(self, filename_or_obj: T_PathFileOrDataStore) -> bool:
from xarray.core.utils import is_remote_uri

filename_or_obj = _normalize_filename_or_obj(filename_or_obj)
magic_number = try_read_magic_number_from_file_or_path(filename_or_obj)
if magic_number is not None:
return magic_number.startswith(b"\211HDF\r\n\032\n")

# Try to read magic number for local files only
is_remote = isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj)
if not is_remote:
magic_number = try_read_magic_number_from_file_or_path(filename_or_obj)
if magic_number is not None:
return magic_number.startswith(b"\211HDF\r\n\032\n")

if isinstance(filename_or_obj, str | os.PathLike):
_, ext = os.path.splitext(filename_or_obj)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally not stripping any query params that might be present in dap query so that h5netcdf does not claim to be able to open it, as it's my undersstanding that it cannot

Expand Down
38 changes: 26 additions & 12 deletions xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
FrozenDict,
close_on_error,
is_remote_uri,
strip_uri_params,
try_read_magic_number_from_path,
)
from xarray.core.variable import Variable
Expand Down Expand Up @@ -701,21 +702,34 @@ class NetCDF4BackendEntrypoint(BackendEntrypoint):
supports_groups = True

def guess_can_open(self, filename_or_obj: T_PathFileOrDataStore) -> bool:
if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj):
return True
# Helper to check if magic number is netCDF or HDF5
def _is_netcdf_magic(magic: bytes) -> bool:
return magic.startswith((b"CDF", b"\211HDF\r\n\032\n"))

# Helper to check if extension is netCDF
def _has_netcdf_ext(path: str | os.PathLike, is_remote: bool = False) -> bool:
path = str(path).rstrip("/")
# For remote URIs, strip query parameters and fragments
if is_remote:
path = strip_uri_params(path)
_, ext = os.path.splitext(path)
return ext in {".nc", ".nc4", ".cdf"}

magic_number = (
bytes(filename_or_obj[:8])
if isinstance(filename_or_obj, bytes | memoryview)
else try_read_magic_number_from_path(filename_or_obj)
)
if magic_number is not None:
# netcdf 3 or HDF5
return magic_number.startswith((b"CDF", b"\211HDF\r\n\032\n"))
if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj):
# For remote URIs, check extension (accounting for query params/fragments)
# Remote netcdf-c can handle both regular URLs and DAP URLs
return _has_netcdf_ext(filename_or_obj, is_remote=True)

if isinstance(filename_or_obj, str | os.PathLike):
_, ext = os.path.splitext(filename_or_obj)
return ext in {".nc", ".nc4", ".cdf"}
# For local paths, check magic number first, then extension
magic_number = try_read_magic_number_from_path(filename_or_obj)
if magic_number is not None:
return _is_netcdf_magic(magic_number)
# No magic number available, fallback to extension
return _has_netcdf_ext(filename_or_obj)

if isinstance(filename_or_obj, bytes | memoryview):
return _is_netcdf_magic(bytes(filename_or_obj[:8]))

return False

Expand Down
21 changes: 20 additions & 1 deletion xarray/backends/pydap_.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import os
from collections.abc import Iterable
from typing import TYPE_CHECKING, Any

Expand Down Expand Up @@ -209,7 +210,25 @@ class PydapBackendEntrypoint(BackendEntrypoint):
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.PydapBackendEntrypoint.html"

def guess_can_open(self, filename_or_obj: T_PathFileOrDataStore) -> bool:
return isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj)
if not isinstance(filename_or_obj, str):
return False

# Check for explicit DAP protocol indicators:
# 1. DAP scheme: dap2:// or dap4:// (case-insensitive, may not be recognized by is_remote_uri)
# 2. Remote URI with /dap2/ or /dap4/ in URL path (case-insensitive)
# Note: We intentionally do NOT check for .dap suffix as that would match
# file extensions like .dap which trigger downloads of binary data
url_lower = filename_or_obj.lower()
if url_lower.startswith(("dap2://", "dap4://")):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mikejmnez is it ok that this will accept both DAP2:// and dap2://?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried it and it works with pydap. same for DAP4 v dap4

return True

# For standard remote URIs, check for DAP indicators in path
if is_remote_uri(filename_or_obj):
return (
"/dap2/" in url_lower or "/dap4/" in url_lower or "/dodsC/" in url_lower
)

return False

def open_dataset(
self,
Expand Down
Loading
Loading