Skip to content

Prevent pydap (dap4) to change string arrays to unicode type (testing). Fixed upstream #10482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/requirements/min-all-deps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ dependencies:
- pandas=2.2
- pint=0.24
- pip
- pydap=3.5
- pydap=3.5.0
- pytest
- pytest-cov
- pytest-env
Expand Down
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@ Bug fixes
are passed (:pull:`10440`). By `Mathias Hauser <https://github.com/mathause>`_.
- Fix :py:func:`testing.assert_equal` with ``check_dim_order=False`` for :py:class:`DataTree` objects
(:pull:`10442`). By `Mathias Hauser <https://github.com/mathause>`_.
- Fix Pydap backend testing. Now test forces string arrays to dtype "S" (pydap converts them to unicode type by default). Removes conditional to numpy version. (:issue:`10261`, :pull:`10482`)
By `Miguel Jimenez-Urias <https://github.com/Mikejmnez>`_.


Documentation
Expand Down
22 changes: 12 additions & 10 deletions xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -3557,7 +3557,7 @@
patches = self.make_patches(store)
with patch.multiple(KVStore, **patches):
original.to_zarr(store)
self.check_requests(expected, patches)

Check failure on line 3560 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.12 all-but-dask

TestInstrumentedZarrStore.test_append AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 1, 'set': 16}) assert 8 <= 4

Check failure on line 3560 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.11

TestInstrumentedZarrStore.test_append AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 1, 'set': 16}) assert 8 <= 4

Check failure on line 3560 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.13

TestInstrumentedZarrStore.test_append AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 1, 'set': 16}) assert 8 <= 4

Check failure on line 3560 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.13 all-but-numba

TestInstrumentedZarrStore.test_append AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 1, 'set': 16}) assert 8 <= 4

Check failure on line 3560 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11

TestInstrumentedZarrStore.test_append AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 1, 'set': 16}) assert 8 <= 4

Check failure on line 3560 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.13

TestInstrumentedZarrStore.test_append AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 1, 'set': 16}) assert 8 <= 4

patches = self.make_patches(store)
# v2024.03.0: {'iter': 6, 'contains': 2, 'setitem': 5, 'getitem': 10, 'listdir': 6, 'list_prefix': 0}
Expand Down Expand Up @@ -3639,7 +3639,7 @@
patches = self.make_patches(store)
with patch.multiple(KVStore, **patches):
ds.to_zarr(store, mode="w", compute=False)
self.check_requests(expected, patches)

Check failure on line 3642 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.11

TestInstrumentedZarrStore.test_region_write AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 4, 'set': 16}) assert 8 <= 2

Check failure on line 3642 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / macos-latest py3.13

TestInstrumentedZarrStore.test_region_write AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 4, 'set': 16}) assert 8 <= 2

Check failure on line 3642 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.13 all-but-numba

TestInstrumentedZarrStore.test_region_write AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 4, 'set': 16}) assert 8 <= 2

Check failure on line 3642 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.11

TestInstrumentedZarrStore.test_region_write AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 4, 'set': 16}) assert 8 <= 2

Check failure on line 3642 in xarray/tests/test_backends.py

View workflow job for this annotation

GitHub Actions / ubuntu-latest py3.13

TestInstrumentedZarrStore.test_region_write AssertionError: ('get', {'get': 8, 'list_dir': 2, 'list_prefix': 4, 'set': 16}) assert 8 <= 2

# v2024.03.0: {'iter': 5, 'contains': 2, 'setitem': 1, 'getitem': 6, 'listdir': 5, 'list_prefix': 0}
# 6057128b: {'iter': 4, 'contains': 2, 'setitem': 1, 'getitem': 5, 'listdir': 4, 'list_prefix': 0}
Expand Down Expand Up @@ -5437,7 +5437,9 @@

ds = DatasetType("bears", **original.attrs)
for key, var in original.data_vars.items():
ds[key] = BaseType(key, var.values, dims=var.dims, **var.attrs)
ds[key] = BaseType(
key, var.values, dtype=var.values.dtype.kind, dims=var.dims, **var.attrs
)
# check all dims are stored in ds
for d in original.coords:
ds[d] = BaseType(d, original[d].values, dims=(d,), **original[d].attrs)
Expand All @@ -5449,9 +5451,9 @@
# print("QQ0:", expected["bears"].load())
pydap_ds = self.convert_to_pydap_dataset(expected)
actual = open_dataset(PydapDataStore(pydap_ds))
if Version(np.__version__) < Version("2.3.0"):
# netcdf converts string to byte not unicode
expected["bears"] = expected["bears"].astype(str)
# netcdf converts string to byte not unicode
# fixed in pydap 3.5.6. https://github.com/pydap/pydap/issues/510
actual["bears"].values = actual["bears"].values.astype("S")
yield actual, expected

def test_cmp_local_file(self) -> None:
Expand Down Expand Up @@ -5495,9 +5497,6 @@
with create_tmp_file() as tmp_file:
actual.to_netcdf(tmp_file)
with open_dataset(tmp_file) as actual2:
if Version(np.__version__) < Version("2.3.0"):
# netcdf converts string to byte not unicode
actual2["bears"] = actual2["bears"].astype(str)
assert_equal(actual2, expected)

@requires_dask
Expand All @@ -5515,9 +5514,11 @@
# in pydap 3.5.0, urls defaults to dap2.
url = "http://test.opendap.org/opendap/data/nc/bears.nc"
actual = open_dataset(url, engine="pydap", **kwargs)
# pydap <3.5.6 converts to unicode dtype=|U. Not what
# xarray expects. Thus force to bytes dtype. pydap >=3.5.6
# does not convert to unicode. https://github.com/pydap/pydap/issues/510
actual["bears"].values = actual["bears"].values.astype("S")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand: pydap is changing behaviour and we are enforcing that new behaviour always?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding, too.

Copy link
Contributor Author

@Mikejmnez Mikejmnez Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sort of... It was more like pydap was always enforcing a change from "S" --> "U", which was not jiving with xarray. Beginning with the new version it will no longer enforce the change to unicode. Once the minimal pydap version moves up from 3.5.0 --> 3.5.6 this line won't be necessary.

with open_example_dataset("bears.nc") as expected:
# workaround to restore string which is converted to byte
expected["bears"] = expected["bears"].astype(str)
yield actual, expected

def output_grid_deprecation_warning_dap2dataset(self):
Expand All @@ -5530,7 +5531,8 @@
actual = open_dataset(url, engine="pydap", **kwargs)
with open_example_dataset("bears.nc") as expected:
# workaround to restore string which is converted to byte
expected["bears"] = expected["bears"].astype(str)
# only needed for pydap <3.5.6 https://github.com/pydap/pydap/issues/510
expected["bears"].values = expected["bears"].values.astype("S")
yield actual, expected

def test_session(self) -> None:
Expand Down
Loading