Fix critical `np.timedelta64` encoding bugs #10469

spencerkclark · 2025-06-30T00:10:31Z

This PR fixes the critical np.timedelta64 encoding bugs introduced in #10101. We now always encode np.timedelta64 values with a dtype attribute corresponding to the in-memory dtype, and use the same encoding path that we did previously, which by default selects the coarsest units that support integer serialization. This enables storing "timedelta64[ns]" values in netCDF3 format, which was not supported by the "literal" encoding approach implemented in #10101.

For consistency with the previous units-based decoding approach, this update also now enables controlling the decoded resolution of dtype-decoded values via the time_unit parameter of the CFTimedeltaCoder class. By default the time_unit parameter is now None. If the time_unit is None the dtype attribute determines the dtype the data is decoded to; if the time_unit is not None it takes precedence.

Closes Writing timedelta64 to netCDF3 always raises an error #10466
Closes Cannot load timedelta64 encoded data from disk #10468
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

cc: @shoyer @kmuehlbauer; sorry again for the trouble.

shoyer

Thanks, Spencer!

xarray/coding/times.py

xarray/tests/test_backends.py

xarray/backends/netcdf3.py

shoyer · 2025-06-30T01:24:48Z

One other strategic question -- do we need this separate logic for decoding into an integer dtype that exactly matches the precision of the datetime64 units?

xarray/xarray/coding/times.py

Lines 1456 to 1487 in 4a581f4

    
           else: 
        
               resolution, _ = np.datetime_data(variable.dtype) 
        
               dtype = np.int64 
        
               attrs_dtype = f"timedelta64[{resolution}]" 
        
               units = _numpy_dtype_to_netcdf_timeunit(variable.dtype) 
        
               safe_setitem(attrs, "dtype", attrs_dtype, name=name) 
        
               # Remove dtype encoding if it exists to prevent it from 
        
               # interfering downstream in NonStringCoder. 
        
               encoding.pop("dtype", None) 
        
               if any( 
        
                   k in encoding for k in _INVALID_LITERAL_TIMEDELTA64_ENCODING_KEYS 
        
               ): 
        
                   raise ValueError( 
        
                       f"Specifying 'add_offset' or 'scale_factor' is not " 
        
                       f"supported when encoding the timedelta64 values of " 
        
                       f"variable {name!r} with xarray's new default " 
        
                       f"timedelta64 encoding approach. To encode {name!r} " 
        
                       f"with xarray's previous timedelta64 encoding " 
        
                       f"approach, which supports the 'add_offset' and " 
        
                       f"'scale_factor' parameters, additionally set " 
        
                       f"encoding['units'] to a unit of time, e.g. " 
        
                       f"'seconds'. To proceed with encoding of {name!r} " 
        
                       f"via xarray's new approach, remove any encoding " 
        
                       f"entries for 'add_offset' or 'scale_factor'." 
        
                   ) 
        
               if "_FillValue" not in encoding and "missing_value" not in encoding: 
        
                   encoding["_FillValue"] = np.iinfo(np.int64).min 
        
           data, units = encode_cf_timedelta(data, units, dtype) 
        
           safe_setitem(attrs, "units", units, name=name) 
        
           return Variable(dims, data, attrs, encoding, fastpath=True)

I am wondering because it seems like this logic is already covered in encode_cf_timedelta(), and adding this explicit check makes means that timedelta64[ns] data cannot be saved directly to netCDF3 files unless the units are converted first. timedelta64[ns] worked via encode_cf_timedelta as long as nanosecond precision isn't really needed, similar to how datetime64[ns] is OK. This was convenient for users of "human" time units.

spencerkclark

Thanks for the quick feedback @shoyer!

I am wondering because it seems like this logic is already covered in encode_cf_timedelta(), and adding this explicit check makes means that timedelta64[ns] data cannot be saved directly to netCDF3 files unless the units are converted first. timedelta64[ns] worked via encode_cf_timedelta as long as nanosecond precision isn't really needed, similar to how datetime64[ns] is OK. This was convenient for users of "human" time units.

I agree this is annoying, and it seems like we should be able to relax it some. I'll think about it some more and see what I can do.

xarray/backends/netcdf3.py

xarray/coding/times.py

xarray/tests/test_backends.py

Always write a dtype attribute to disk regardless of how the timedeltas were decoded.

for more information, see https://pre-commit.ci

dcherian · 2025-07-01T16:20:05Z

Would we be open to renaming the attribute that encodes the timedelta64 dtype "xarray_dtype"?

I made the similar proposal for Intervals here: #8005 (comment)

xarray/tests/test_coding_times.py

xarray/coding/times.py

spencerkclark · 2025-07-01T18:16:35Z

Would we be open to renaming the attribute that encodes the timedelta64 dtype "xarray_dtype"?

I made the similar proposal for Intervals here: #8005 (comment)

Thanks @dcherian—Stephan's suggestion of always writing a dtype attribute to disk, regardless of how the data existed in the original file, helped obviate the need for renaming this attribute in this PR. The name dtype has some precedent, e.g. when coding boolean values, which is why it was initially chosen in #10101:

xarray/xarray/coding/variables.py

Lines 582 to 594 in 55dc766

    
           def encode(self, variable: Variable, name: T_Name = None) -> Variable: 
        
               if ( 
        
                   (variable.dtype == bool) 
        
                   and ("dtype" not in variable.encoding) 
        
                   and ("dtype" not in variable.attrs) 
        
               ): 
        
                   dims, data, attrs, encoding = unpack_for_encoding(variable) 
        
                   attrs["dtype"] = "bool" 
        
                   data = duck_array_ops.astype(data, dtype="i1", copy=True) 
        
                   return Variable(dims, data, attrs, encoding, fastpath=True) 
        
               else: 
        
                   return variable

shoyer · 2025-07-01T22:37:51Z

Thank Spencer, this looks great.

I can confirm this fixes a few Google related projects that were using timedelta64 serialization.

kmuehlbauer

Thanks @spencerkclark! I'm always hoping that it will be the last time we have to touch these lines of code. Great! 🎉

spencerkclark · 2025-07-02T11:48:47Z

Thanks @shoyer and @kmuehlbauer for the quick reviews, and confirming that it works in downstream projects. I think this is in a much better state (not least because it now actually works as advertised...). I will go ahead and merge.

TomNicholas · 2025-07-02T21:04:13Z

Guys this was merged with failing tests, and the failures appear to have somehow been introduced in this commit of this PR. Please see zarr-developers/zarr-python#3199, ad let us know if you have any idea why changes to encoding would affect zarr like this... 😕

spencerkclark · 2025-07-02T23:19:15Z

Sorry about that! I wrongfully assumed those failures were unrelated, since they passed earlier in the development of this PR. I can reproduce locally, but I'm still a bit puzzled. It seems related to me parametrizing this test with a time_unit argument, which I only did in a later commit (if I remove that then these tests succeed).

Ah as I'm writing this it looks like you've sorted it out (#10492) — thanks!

Fix literal timedelta encoding bugs

4a581f4

github-actions bot added topic-backends topic-documentation topic-cftime io labels Jun 30, 2025

shoyer reviewed Jun 30, 2025

View reviewed changes

xarray/coding/times.py Outdated Show resolved Hide resolved

xarray/tests/test_backends.py Outdated Show resolved Hide resolved

xarray/backends/netcdf3.py Outdated Show resolved Hide resolved

spencerkclark commented Jun 30, 2025

View reviewed changes

xarray/backends/netcdf3.py Outdated Show resolved Hide resolved

xarray/coding/times.py Outdated Show resolved Hide resolved

xarray/tests/test_backends.py Outdated Show resolved Hide resolved

spencerkclark changed the title ~~Fix critical literal np.timedelta64 encoding bugs~~ Fix critical np.timedelta64 encoding bugs Jul 1, 2025

spencerkclark force-pushed the fix-timedelta-coding branch from e189b66 to 624625b Compare July 1, 2025 15:52

Unify timedelta64 coding logic between the old and new approaches

bdda733

Always write a dtype attribute to disk regardless of how the timedeltas were decoded.

spencerkclark force-pushed the fix-timedelta-coding branch from 624625b to bdda733 Compare July 1, 2025 15:54

pre-commit-ci bot and others added 2 commits July 1, 2025 15:55

[pre-commit.ci] auto fixes from pre-commit.com hooks

153038e

for more information, see https://pre-commit.ci

Merge branch 'main' into fix-timedelta-coding

e5a0a85

shoyer reviewed Jul 1, 2025

View reviewed changes

xarray/tests/test_coding_times.py Show resolved Hide resolved

xarray/tests/test_coding_times.py Show resolved Hide resolved

xarray/coding/times.py Show resolved Hide resolved

Add decode_timedelta=True test case

0542961

shoyer approved these changes Jul 1, 2025

View reviewed changes

Merge branch 'main' into fix-timedelta-coding

f07fb4e

kmuehlbauer approved these changes Jul 2, 2025

View reviewed changes

Merge branch 'main' into fix-timedelta-coding

7413588

spencerkclark merged commit 516ec07 into pydata:main Jul 2, 2025
24 of 33 checks passed

spencerkclark deleted the fix-timedelta-coding branch July 2, 2025 11:50

ianhi mentioned this pull request Jul 2, 2025

Fix Zarr 'number of requests' test #10492

Merged

1 task

Uh oh!

Fix critical np.timedelta64 encoding bugs #10469

Fix critical np.timedelta64 encoding bugs #10469

Uh oh!

Conversation

spencerkclark commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shoyer commented Jun 30, 2025

Uh oh!

spencerkclark left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcherian commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spencerkclark commented Jul 1, 2025

Uh oh!

shoyer commented Jul 1, 2025

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

spencerkclark commented Jul 2, 2025

Uh oh!

Uh oh!

TomNicholas commented Jul 2, 2025

Uh oh!

spencerkclark commented Jul 2, 2025

Uh oh!

Uh oh!

Fix critical `np.timedelta64` encoding bugs #10469

Fix critical `np.timedelta64` encoding bugs #10469

spencerkclark commented Jun 30, 2025 •

edited

Loading