-
Notifications
You must be signed in to change notification settings - Fork 281
PoC(tap): Succinct hashed bin delegations #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC(tap): Succinct hashed bin delegations #1106
Conversation
lukpueh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a stab at the hashed bin delegation POC. Before I continue reviewing I have a high-level question. Why do we have different naming conventions for normal hashed bin delegated roles and succinct hashed bin delegated roles?
It would save us a lot of implementation trouble, if we used the same convention. And I don't see a good reason why we shouldn't.
IIUC the delegated role names serve the same purpose in both cases, that is to indicate what range of hashed bins the role is responsible for.
The main advantage of the new naming scheme for succinct hbd, i.e. "<DELEGATING_ROLE>.hbd-(0..2^BIT_LENGTH-1)", compared to the naming scheme of normal hbd, i.e. "<LOW>-<HIGH>", is that it adds the ability of different roles delegating to different hashed bins in one repository. Is there a use case for this? It seems more like a side-effect of the TAP than an original motivation. Also, if we didn't need that flexibility for normal hbd, why add it now for succinct hbd?
Please let me know if I'm missing something.
tuf/formats.py
Outdated
| PATH_HASH_PREFIXES_SCHEMA = SCHEMA.ListOf(PATH_HASH_PREFIX_SCHEMA) | ||
|
|
||
| SUCCINCT_HASH_DELEGATIONS_SCHEMA = SCHEMA.Object( | ||
| prefix_bit_length = SCHEMA.Integer(lo=0, hi=32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the tap defines it this way, so this comment is probably a better fit for theupdateframework/taps#120, but is it really necessary to use an object/dict with only one item? Can't we just assign the prefix bit length value directly to the succinct_hash_delegations field (and maybe rethink its name)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the object with one item feels a bit redundant. I'll post in the TAP to propose a change.
tuf/repository_tool.py
Outdated
| prefix_len = role['succinct_hash_delegations']['prefix_bit_length'] | ||
| num_bins = 2 ** prefix_len | ||
| for i in range(num_bins): | ||
| bin_name = role['name'] + '-' + "{num:0{len}x}".format(num=i, len=prefix_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Programming-style nit: I suggest to only use one of format strings or string concatenation, e.g.:
"{name}-{num:0{len}x}".format(name=role['name'], num=i, len=prefix_len)
Yes, I believe the intention of the new naming scheme is to allow for different roles to create independent hashed bin delegations. I don't know that this ability has been requested by any existing implementer, but I can imagine a use case for registries as different uploaders to the same registry may each want to use succinct delegations. Does this seem like a likely use case? If not, we should probably stick with the existing naming scheme. |
Yes, different parties need to be able to use this mechanism. I agree we must not omit the <DELEGATING_ROLE> from the name. |
|
Thanks for addressing my comments and taking the naming discussion to the TAP page. Could you please also make sure to follow (our) commit guidelines. |
|
I still think that we should harmonize the naming conventions, in order to reduce the delegated bin role name case handling in the code. Since the traditional hashed bin delegation does not rely on a name convention (at least not on the client), we could:
|
tuf/repository_tool.py
Outdated
| succinct: | ||
| Whether the delegated role is used by succinct hashed bin delegations. | ||
| If succinct delegations are used, the role is added to '_succinct_delegations' | ||
| instead of '_delegated_roles'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
I am also in favour of using consistent hashed bin naming conventions |
|
The idea behind my suggestion from above was that we could use the same (slightly updated) hash bin helper functions for both types of hashed bin delegation. So that we don't have to concatenate strings / construct names anywhere else. def create_bin_name(bin_info, prefix=""):
"""Returns prefix + suffix.
Suffix is computed based on the passed bin identifying info and follows a convention ,
e.g. lo-hi of hex hash target prefix, or binary target prefix, or bin index, etc. """
...
def find_bin_for_target_hash(target_hash, bins_info):
"""Returns either the bin name or abstract bin info that can be used to create the name.
bins_info is used to find out how many bins there are and what target prefixes each bin
serves. It could be number of bins, or hash prefix bit length, etc."""
... |
826626e to
808047a
Compare
|
@lukpueh I updated this pr to try to simplify the naming convention, as well as rebasing onto develop. One thing to note is that this updates the default way that non-succinct hashed bins are named. I don't think this breaks backwards compatibility as the previous naming convention (or any arbitrary one) is still supported by the client. |
|
Thanks for the updates, @mnm678! I agree with you that the update shouldn't be a problem for clients. It is still a backwards incompatible change to the repository tool though. But I don't think that's a big problem. Although, we could just keep using the hash-prefix-range as role name suffix for both classic and succinct hbd, then we wouldn't have a problem at all. Also, I noticed that you still repeat the bin role name creation in quite a few places (see various occurrences of |
|
@mnm678, please ping me when you have addressed my comments and the PR is ready for another review, or if you need any help. :) Also, could you please try to stick closer to our git commit and history guidelines (separation of commits, commit message wording and line length, no merge commits in feature branch, etc...). Thanks! |
f494684 to
a4a77c1
Compare
This commit adds the repository metadata generation for succinct hashed bin delegations. Signed-off-by: marinamoore <[email protected]>
This commit updates support for succinct hashed bin delegations in delegate and delegate_hashed_bins to create role metadata for succinct delegations. Signed-off-by: marinamoore <[email protected]>
…h repository.writeall() This does not include support for loading a repository containing succinct hashed bin delegations. Signed-off-by: marinamoore <[email protected]>
This commit updates load_repository to properly handle succinct hashed bin delegations. Signed-off-by: marinamoore <[email protected]>
Signed-off-by: marinamoore <[email protected]>
…elegations. Signed-off-by: marinamoore <[email protected]>
…ames Signed-off-by: marinamoore <[email protected]>
Roles used by succinct hashed bin delegations are not listed in the delegating metadata, so cannot be added to _delegated_roles. Therefore the _succinct_delegations field manages these roles. Signed-off-by: marinamoore <[email protected]>
This commit replaces:
("succinct_hash_delegations" : {
"prefix_bit_length" : BIT_LENGTH
})
with
("delegation_hash_prefix_len": BIT_LENGTH)
to reflect an update to the TAP and to simplify the implementation.
Signed-off-by: marinamoore <[email protected]>
This commit allows hashed bin delegations (both succinct and otherwise) to be created with a custom prefix. If the prefix is not provided, the existing defaults will be used. As part of this addition, this commit cleans up some of the checks for succinct hashed bin delegations to check for the existence of the delegation_hash_prefix_len field rather than using the format of the name to determine the type of delegation. Signed-off-by: marinamoore <[email protected]>
…the client To reflect the changes to the TAP and to the repository implementation, this commit uses the delegation_hash_prefix_len notation for succinct hashed bin delegations. In addition, this commit updates the naming convention used by the client for succinct hashed bin delegations. Signed-off-by: marinamoore <[email protected]>
These tests ensure that the custom prefix names provided in delegate_hashed_bins are used to name bin roles. Signed-off-by: marinamoore <[email protected]>
Use the same naming convention for both succinct and non-succinct hashed bins by taking in a prefix, or by default using delegating_rolename.hbd- This means that all bins will be named with a count instead of listing the prefix range in the name Signed-off-by: marinamoore <[email protected]>
Make sure that the keyids for succinct delegations are properly loaded into the repository. Signed-off-by: marinamoore <[email protected]>
Update the repo_lib helper functions to use a prefix and create a name using a zero-padded count instead of the prefix range. So the name will be of the format prefix-count (for example bin.hbd-02). In addition, this updates the repository tool to use the updated repo_lib functions to avoid duplicate code. Signed-off-by: marinamoore <[email protected]>
This updates the succinct hash bin delegation tests to use the new naming convention. It also fixes a small variable reuse issue in repository lib Signed-off-by: marinamoore <[email protected]>
a4a77c1 to
4a732b9
Compare
|
@lukpueh I think this is ready for another look. I did some git history cleanup, but can make some more updates to the commit messages if needed. |
joshuagl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work here @mnm678, thank you.
I have a few questions and comments in line. Most significantly I wonder if the TAP requires that we change the way we name hbd. Could we support succinct hashed bin delegations with the hash-prefix range as calculated by the current code as the rolename suffix? That way we could harmonise our hashed bin code and not introduce a backwards incompatible change to repository_tool?
Hopefully this could help us implement succinct hashed bin delegations with fewer branches too.
tuf/repository_tool.py
Outdated
| bins by calculating the target path's hash prefix. | ||
| succinct: | ||
| If the delegation is to bins using succinct hashed bin delegations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is out of date? Aren't we expecting an integer argument here?
tuf/repository_tool.py
Outdated
| hashed bin delegations. Targets may be located and stored in hashed | ||
| bins by calculating the target path's hash prefix. | ||
| succinct: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| succinct: | |
| succinct_hash_delegations: |
tuf/repository_lib.py
Outdated
| zero-padded (up to prefix_len) strings i.e. for low=00, high=07, | ||
| prefix_len=3 the returned name would be '000-007'. | ||
| Create a string name of a delegated hash bin, where name will be | ||
| prefix-NUM where NUM will be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment implies we insert the '-', but we don't appear to be doing that.
tuf/repository_lib.py
Outdated
| Create a string name of a delegated hash bin, where name will be | ||
| prefix-NUM where NUM will be a | ||
| zero-padded (up to prefix_len) string i.e. for bin_num=0, bin_num_length=2 | ||
| prefix="pre", the name will be pre-00 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, implies we're inserting the '-'.
tuf/repository_tool.py
Outdated
| logger.debug(repr(rolename) + ' already exists.') | ||
| if succinct: | ||
| if rolename in self._succinct_delegations: | ||
| logger.debug(repr(rolename) + ' already exists.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to ignore this? What happens if we load a repository with succinct hashed bin delegations and want to change i.e. the number of succinct hashed bins?
Is there any harm in adding the role to self._succinct_delegations even when it's already present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I think replacing any existing value makes more sense.
| self._succinct_delegations[rolename] = targets_object | ||
|
|
||
| else: | ||
| self._delegated_roles[rolename] = targets_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just leave this as-is and always assign. The debug info isn't useful, I don't think.
tuf/repository_tool.py
Outdated
| "threshold": 1, | ||
| "terminating": False, | ||
| "target_paths": []} | ||
| ordered_roles.append(role) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The role is already appended at the end of the loop, outside of the conditional. I think we're adding the role to ordered_roles twice?
| ordered_roles.append(role) |
tuf/repository_tool.py
Outdated
| "keyids": keyids, | ||
| "threshold": 1, | ||
| "terminating": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not required? We're setting these defaults later when iterating over ordered_roles
tuf/repository_tool.py
Outdated
| 'version':1, | ||
| 'threshold': 1, | ||
| 'terminating': False, | ||
| 'paths': []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are an empty paths and the version required? If so we should add them for non-succinct hbd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we don't appear to be doing anything with this roleinfo dict? Let's just delete this else clause?
Thanks for the review @joshuagl! The main downside I see to using the current naming scheme is that it could make very long filenames if the prefix is sufficiently long. For example |
Co-authored-by: Joshua Lock <[email protected]> Signed-off-by: marinamoore <[email protected]>
Make sure that docstrings match the functions Signed-off-by: marinamoore <[email protected]>
Remove unnecessary or duplicate code in response to feedback on the pr. This also removes the existance check in add_delegated_role so that the value passed will replace any existing delegation. This allows the caller to replace an outdated role using this function. Signed-off-by: marinamoore <[email protected]>
b1d22cc to
589840e
Compare
This commit creates a 'succinct_hash_delegations' field that contains 'delegation_hash_prefix_len' and 'bin_name_prefix' to be used for succinct hashed bin delegations. When this field is present, the 'name' field is no longer needed, and so is removed from the metadata. For more detail, see the succinct hashed bin delegation TAP pr. Signed-off-by: marinamoore <[email protected]>
| for delegated_role in roleinfo['delegations']['roles']: | ||
| delegated_roles.append(delegated_role['name']) | ||
| if 'succinct_hash_delegations' in delegated_role: | ||
| delegated_roles.append(delegated_role['succinct_hash_delegations']['bin_name_prefix']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the prefix enough here, or do we need the full role name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the expected behavior here would be. A single succinct delegation in the metadata corresponds to many actual delegations, so should this function return names of the delegated roles in the metadata file (what it currently does), or the names of all roles delegated to by the current role (including the full names of all succinct delegations)?
Either way, I'll be sure to document it in the docstring.
|
As this is a proof of concept for a TAP I've converted it to a draft PR. |
|
Now with the legacy code gone this PR will need quite a re-write. Most of above comments were either relevant to the legacy code, or to the TAP itself, and have been addressed in the taps repo. So I suggest to close here and re-open a fresh PR. For the repository side, since we currently don't have a repository tool, it's probably best to prepare something like the hashed bin delegation example, and the client-side code should work with |
This pr adds support for succinct hashed bin delegations as described in theupdateframework/taps#120.