Skip to content

How can I specify the Unicode normalization when writing to NetCDF? #10498

Open
@tomchor

Description

@tomchor

What is your issue?

Hi! Is there a way to specify Unicode normalization (e.g., NFD, NFC) when using to_netcdf()? I have variable names with Unicode characters and want to ensure consistent normalization.

Here's a simple example of the issue:

import xarray as xr
# Create dataset with Unicode variable name and save it
original_name = "ā"
ds = xr.Dataset({original_name: ([], 1)}).to_netcdf("test.nc")
ds_loaded = xr.open_dataset("test.nc")
loaded_name = list(ds_loaded.variables.keys())[0]
original_name == loaded_name # Returns false

Doing some digging, it seems that when writing to NetCDF, everything gets normalized with the NFC normalization. However, I'd like to normalize them with the NFD option, since it matches the characters I can compose on IPython, vim, vscode, etc.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions