Skip to content

Fix DirectoryStore #773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 20, 2021
Prev Previous commit
Next Next commit
Migrate key logic to core rather than storage
Previous tests (now commented out) used logic in the store
classes to convert "0/0" keys into "0.0" keys, forcing the
store to be aware of array details. This tries to swap the
logic so that stores are responsible for passing dimension
separator values down to the arrays only. Since arrays can
also get the dimension_separator value from a .zarray file
they are now in charge.
  • Loading branch information
joshmoore committed Jun 14, 2021
commit e1835667116b3f86fe002183a6b527565743795c
2 changes: 1 addition & 1 deletion zarr/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1952,7 +1952,7 @@ def _process_for_setitem(self, ckey, chunk_selection, value, fields=None):
return self._encode_chunk(chunk)

def _chunk_key(self, chunk_coords):
return self._key_prefix + '.'.join(map(str, chunk_coords))
return self._key_prefix + self._dimension_separator.join(map(str, chunk_coords))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this change is compatible with FSStore._normalize_key? https://github.com/zarr-developers/zarr-python/blob/master/zarr/storage.py#L1076

FSStore._normalize_key assumes that all chunk keys are formatted foo/bar/0.0.0 -- this assumption is the basis of splitting the chunk key into a prefix and a chunk ID via key.split('/'). As I understand it, this change breaks this assumption.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the flat and nested fixtures from this repo (zarr.open(f"file:///tmp/{x}")[:]) with some sloppy debugging in place shows:

|               | Array._chunk_key | FSStore._normalize_keys |
|---------------|------------------|-------------------------|
| master:nested |  (0, 0) --> 0.0  |       0.0 --> 0.0       |
| master:flat   |  (0, 0) --> 0.0  |       0.0 --> 0.0       |
| PR:nested     |  (0, 0) --> 0/0  |       0/0 --> 0/0       |
| PR:flat       |  (0, 0) --> 0.0  |       0.0 --> 0.0       |

which likely points to some logic in FSStore being ripe for removal since the Store is basically just accepting what what the Array has detected. Now, how it is that that's working with your PR, I still haven't figured out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, as your test shows this is fine for FSStore (and maybe we don't need this code in the store at all if the chunk keys come pre-normalized). But this situation is dire for N5Stores, which need to be able to re-order the chunk keys before writing to storage.


def _decode_chunk(self, cdata, start=None, nitems=None, expected_shape=None):
# decompress
Expand Down
8 changes: 0 additions & 8 deletions zarr/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -803,10 +803,6 @@ def __init__(self, path, normalize_keys=False, dimension_separator=None):
def _normalize_key(self, key):
return key.lower() if self.normalize_keys else key

def _optionally_nested(self, key):
return self._dimension_separator == "/" and \
_nested_map_ckey(key) or key

def _fromfile(self, fn):
""" Read data from a file

Expand Down Expand Up @@ -842,7 +838,6 @@ def _tofile(self, a, fn):
f.write(a)

def __getitem__(self, key):
key = self._optionally_nested(key)
key = self._normalize_key(key)
filepath = os.path.join(self.path, key)
if os.path.isfile(filepath):
Expand All @@ -851,7 +846,6 @@ def __getitem__(self, key):
raise KeyError(key)

def __setitem__(self, key, value):
key = self._optionally_nested(key)
key = self._normalize_key(key)

# coerce to flat, contiguous array (ideally without copying)
Expand Down Expand Up @@ -893,7 +887,6 @@ def __setitem__(self, key, value):
os.remove(temp_path)

def __delitem__(self, key):
key = self._optionally_nested(key)
key = self._normalize_key(key)
path = os.path.join(self.path, key)
if os.path.isfile(path):
Expand All @@ -906,7 +899,6 @@ def __delitem__(self, key):
raise KeyError(key)

def __contains__(self, key):
key = self._optionally_nested(key)
key = self._normalize_key(key)
file_path = os.path.join(self.path, key)
return os.path.isfile(file_path)
Expand Down
8 changes: 4 additions & 4 deletions zarr/tests/test_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -1165,10 +1165,10 @@ def test_chunk_nesting(self):
# any path where last segment looks like a chunk key gets special handling
store['0.0'] = b'xxx'
assert b'xxx' == store['0.0']
assert b'xxx' == store['0/0']
# assert b'xxx' == store['0/0']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we loose there? Doesn't the store normalise "." -> "/" anyway?

Copy link
Member Author

@joshmoore joshmoore Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to describe in the commit message on e183566, at least to the best of my understanding:

Previous tests (now commented out) used logic in the store classes to convert "0/0" keys into "0.0" keys, forcing the store to be aware of array details. This tries to swap the logic so that stores are responsible for passing dimension separator values down to the arrays only. Since arrays can also get the dimension_separator value from a .zarray file they are now in charge.

store['foo/10.20.30'] = b'yyy'
assert b'yyy' == store['foo/10.20.30']
assert b'yyy' == store['foo/10/20/30']
# assert b'yyy' == store['foo/10/20/30']
store['42'] = b'zzz'
assert b'zzz' == store['42']

Expand Down Expand Up @@ -1213,12 +1213,12 @@ def test_chunk_nesting(self):
store['0.0'] = b'xxx'
assert '0.0' in store
assert b'xxx' == store['0.0']
assert b'xxx' == store['0/0']
# assert b'xxx' == store['0/0']
store['foo/10.20.30'] = b'yyy'
assert 'foo/10.20.30' in store
assert b'yyy' == store['foo/10.20.30']
# N5 reverses axis order
assert b'yyy' == store['foo/30/20/10']
# assert b'yyy' == store['foo/30/20/10']
store['42'] = b'zzz'
assert '42' in store
assert b'zzz' == store['42']
Expand Down