Skip to content

Conversation

@mnm678
Copy link
Contributor

@mnm678 mnm678 commented Aug 11, 2020

This pr adds support for succinct hashed bin delegations as described in theupdateframework/taps#120.

@mnm678 mnm678 marked this pull request as ready for review August 12, 2020 20:48
@mnm678 mnm678 requested a review from lukpueh August 12, 2020 20:48
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at the hashed bin delegation POC. Before I continue reviewing I have a high-level question. Why do we have different naming conventions for normal hashed bin delegated roles and succinct hashed bin delegated roles?

It would save us a lot of implementation trouble, if we used the same convention. And I don't see a good reason why we shouldn't.

IIUC the delegated role names serve the same purpose in both cases, that is to indicate what range of hashed bins the role is responsible for.

The main advantage of the new naming scheme for succinct hbd, i.e. "<DELEGATING_ROLE>.hbd-(0..2^BIT_LENGTH-1)", compared to the naming scheme of normal hbd, i.e. "<LOW>-<HIGH>", is that it adds the ability of different roles delegating to different hashed bins in one repository. Is there a use case for this? It seems more like a side-effect of the TAP than an original motivation. Also, if we didn't need that flexibility for normal hbd, why add it now for succinct hbd?

Please let me know if I'm missing something.

cc @joshuagl, @trishankatdatadog, @JustinCappos

tuf/formats.py Outdated
PATH_HASH_PREFIXES_SCHEMA = SCHEMA.ListOf(PATH_HASH_PREFIX_SCHEMA)

SUCCINCT_HASH_DELEGATIONS_SCHEMA = SCHEMA.Object(
prefix_bit_length = SCHEMA.Integer(lo=0, hi=32))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the tap defines it this way, so this comment is probably a better fit for theupdateframework/taps#120, but is it really necessary to use an object/dict with only one item? Can't we just assign the prefix bit length value directly to the succinct_hash_delegations field (and maybe rethink its name)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the object with one item feels a bit redundant. I'll post in the TAP to propose a change.

prefix_len = role['succinct_hash_delegations']['prefix_bit_length']
num_bins = 2 ** prefix_len
for i in range(num_bins):
bin_name = role['name'] + '-' + "{num:0{len}x}".format(num=i, len=prefix_len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Programming-style nit: I suggest to only use one of format strings or string concatenation, e.g.:

"{name}-{num:0{len}x}".format(name=role['name'], num=i, len=prefix_len)

@mnm678
Copy link
Contributor Author

mnm678 commented Aug 21, 2020

The main advantage of the new naming scheme for succinct hbd, i.e. "<DELEGATING_ROLE>.hbd-(0..2^BIT_LENGTH-1)", compared to the naming scheme of normal hbd, i.e. "<LOW>-<HIGH>", is that it adds the ability of different roles delegating to different hashed bins in one repository. Is there a use case for this? It seems more like a side-effect of the TAP than an original motivation. Also, if we didn't need that flexibility for normal hbd, why add it now for succinct hbd?

Yes, I believe the intention of the new naming scheme is to allow for different roles to create independent hashed bin delegations. I don't know that this ability has been requested by any existing implementer, but I can imagine a use case for registries as different uploaders to the same registry may each want to use succinct delegations. Does this seem like a likely use case? If not, we should probably stick with the existing naming scheme.

@JustinCappos
Copy link
Member

The main advantage of the new naming scheme for succinct hbd, i.e. "<DELEGATING_ROLE>.hbd-(0..2^BIT_LENGTH-1)", compared to the naming scheme of normal hbd, i.e. "<LOW>-<HIGH>", is that it adds the ability of different roles delegating to different hashed bins in one repository. Is there a use case for this? It seems more like a side-effect of the TAP than an original motivation. Also, if we didn't need that flexibility for normal hbd, why add it now for succinct hbd?

Yes, I believe the intention of the new naming scheme is to allow for different roles to create independent hashed bin delegations. I don't know that this ability has been requested by any existing implementer, but I can imagine a use case for registries as different uploaders to the same registry may each want to use succinct delegations. Does this seem like a likely use case? If not, we should probably stick with the existing naming scheme.

Yes, different parties need to be able to use this mechanism. I agree we must not omit the <DELEGATING_ROLE> from the name.

@lukpueh
Copy link
Member

lukpueh commented Aug 24, 2020

Thanks for addressing my comments and taking the naming discussion to the TAP page. Could you please also make sure to follow (our) commit guidelines.

@lukpueh
Copy link
Member

lukpueh commented Aug 24, 2020

I still think that we should harmonize the naming conventions, in order to reduce the delegated bin role name case handling in the code. Since the traditional hashed bin delegation does not rely on a name convention (at least not on the client), we could:

  • allow a custom role name prefix such as "<DELEGATING ROLE>.hbd-" there as well,
  • and we could also use the bin number as suffix instead of the hash prefix range.

succinct:
Whether the delegated role is used by succinct hashed bin delegations.
If succinct delegations are used, the role is added to '_succinct_delegations'
instead of '_delegated_roles'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@joshuagl
Copy link
Member

I still think that we should harmonize the naming conventions, in order to reduce the delegated bin role name case handling in the code. Since the traditional hashed bin delegation does not rely on a name convention (at least not on the client), we could:

  • allow a custom role name prefix such as "<DELEGATING ROLE>.hbd-" there as well,
  • and we could also use the bin number as suffix instead of the hash prefix range.

I am also in favour of using consistent hashed bin naming conventions

@lukpueh
Copy link
Member

lukpueh commented Aug 27, 2020

The idea behind my suggestion from above was that we could use the same (slightly updated) hash bin helper functions for both types of hashed bin delegation. So that we don't have to concatenate strings / construct names anywhere else.

def create_bin_name(bin_info, prefix=""):
  """Returns prefix + suffix. 
  Suffix is computed based on the passed bin identifying info and follows a convention ,
  e.g. lo-hi of hex hash target prefix, or binary target prefix, or bin index, etc. """
  ...

def find_bin_for_target_hash(target_hash, bins_info):
  """Returns either the bin name or abstract bin info that can be used to create the name.
  bins_info is used to find out how many bins there are and what target prefixes each bin
  serves. It could be number of bins, or hash prefix bit length, etc."""
  ...

@mnm678 mnm678 force-pushed the succinct-hash-delegations branch from 826626e to 808047a Compare September 2, 2020 22:28
@mnm678
Copy link
Contributor Author

mnm678 commented Sep 3, 2020

@lukpueh I updated this pr to try to simplify the naming convention, as well as rebasing onto develop. One thing to note is that this updates the default way that non-succinct hashed bins are named. I don't think this breaks backwards compatibility as the previous naming convention (or any arbitrary one) is still supported by the client.

@lukpueh
Copy link
Member

lukpueh commented Sep 4, 2020

Thanks for the updates, @mnm678! I agree with you that the update shouldn't be a problem for clients. It is still a backwards incompatible change to the repository tool though. But I don't think that's a big problem. Although, we could just keep using the hash-prefix-range as role name suffix for both classic and succinct hbd, then we wouldn't have a problem at all.

Also, I noticed that you still repeat the bin role name creation in quite a few places (see various occurrences of {num:0{len}x} or name_prefix = prefix + '-'), don't you think it's better to do this in a single place as I suggested above in #1106 (comment)?

@lukpueh
Copy link
Member

lukpueh commented Sep 7, 2020

@mnm678, please ping me when you have addressed my comments and the PR is ready for another review, or if you need any help. :)

Also, could you please try to stick closer to our git commit and history guidelines (separation of commits, commit message wording and line length, no merge commits in feature branch, etc...). Thanks!

@mnm678 mnm678 force-pushed the succinct-hash-delegations branch from f494684 to a4a77c1 Compare September 8, 2020 20:56
This commit adds the repository metadata generation for
succinct hashed bin delegations.

Signed-off-by: marinamoore <[email protected]>
This commit updates support for succinct hashed bin delegations
in delegate and delegate_hashed_bins to create role metadata for
succinct delegations.

Signed-off-by: marinamoore <[email protected]>
…h repository.writeall()

This does not include support for loading a repository containing succinct
hashed bin delegations.

Signed-off-by: marinamoore <[email protected]>
This commit updates load_repository to properly handle succinct hashed bin delegations.

Signed-off-by: marinamoore <[email protected]>
Roles used by succinct hashed bin delegations are not
listed in the delegating metadata, so cannot be added to
_delegated_roles. Therefore the _succinct_delegations
field manages these roles.

Signed-off-by: marinamoore <[email protected]>
This commit replaces:

("succinct_hash_delegations" : {
    "prefix_bit_length" : BIT_LENGTH
})

with

("delegation_hash_prefix_len": BIT_LENGTH)

to reflect an update to the TAP and to simplify the implementation.

Signed-off-by: marinamoore <[email protected]>
This commit allows hashed bin delegations (both succinct and otherwise)
to be created with a custom prefix. If the prefix is not provided,
the existing defaults will be used.

As part of this addition, this commit cleans up some of the checks for
succinct hashed bin delegations to check for the existence of the
delegation_hash_prefix_len field rather than using the format of the name to
determine the type of delegation.

Signed-off-by: marinamoore <[email protected]>
…the client

To reflect the changes to the TAP and to the repository implementation,
this commit uses the delegation_hash_prefix_len notation for
succinct hashed bin delegations.

In addition, this commit updates the naming convention used by the
client for succinct hashed bin delegations.

Signed-off-by: marinamoore <[email protected]>
These tests ensure that the custom prefix names provided in
delegate_hashed_bins are used to name bin roles.

Signed-off-by: marinamoore <[email protected]>
Use the same naming convention for both succinct and non-succinct
hashed bins by taking in a prefix, or by default using
delegating_rolename.hbd-

This means that all bins will be named with a count
instead of listing the prefix range in the name

Signed-off-by: marinamoore <[email protected]>
Make sure that the keyids for succinct delegations
are properly loaded into the repository.

Signed-off-by: marinamoore <[email protected]>
Update the repo_lib helper functions to use a prefix
and create a name using a zero-padded count instead of
the prefix range. So the name will be of the format
prefix-count (for example bin.hbd-02).

In addition, this updates the repository tool to use
the updated repo_lib functions to avoid duplicate code.

Signed-off-by: marinamoore <[email protected]>
This updates the succinct hash bin delegation tests
to use the new naming convention. It also fixes a
small variable reuse issue in repository lib

Signed-off-by: marinamoore <[email protected]>
@mnm678 mnm678 force-pushed the succinct-hash-delegations branch from a4a77c1 to 4a732b9 Compare September 8, 2020 20:57
@mnm678
Copy link
Contributor Author

mnm678 commented Sep 8, 2020

@lukpueh I think this is ready for another look. I did some git history cleanup, but can make some more updates to the commit messages if needed.

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work here @mnm678, thank you.

I have a few questions and comments in line. Most significantly I wonder if the TAP requires that we change the way we name hbd. Could we support succinct hashed bin delegations with the hash-prefix range as calculated by the current code as the rolename suffix? That way we could harmonise our hashed bin code and not introduce a backwards incompatible change to repository_tool?

Hopefully this could help us implement succinct hashed bin delegations with fewer branches too.

bins by calculating the target path's hash prefix.
succinct:
If the delegation is to bins using succinct hashed bin delegations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is out of date? Aren't we expecting an integer argument here?

hashed bin delegations. Targets may be located and stored in hashed
bins by calculating the target path's hash prefix.
succinct:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
succinct:
succinct_hash_delegations:

zero-padded (up to prefix_len) strings i.e. for low=00, high=07,
prefix_len=3 the returned name would be '000-007'.
Create a string name of a delegated hash bin, where name will be
prefix-NUM where NUM will be a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment implies we insert the '-', but we don't appear to be doing that.

Create a string name of a delegated hash bin, where name will be
prefix-NUM where NUM will be a
zero-padded (up to prefix_len) string i.e. for bin_num=0, bin_num_length=2
prefix="pre", the name will be pre-00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, implies we're inserting the '-'.

logger.debug(repr(rolename) + ' already exists.')
if succinct:
if rolename in self._succinct_delegations:
logger.debug(repr(rolename) + ' already exists.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to ignore this? What happens if we load a repository with succinct hashed bin delegations and want to change i.e. the number of succinct hashed bins?

Is there any harm in adding the role to self._succinct_delegations even when it's already present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I think replacing any existing value makes more sense.

self._succinct_delegations[rolename] = targets_object

else:
self._delegated_roles[rolename] = targets_object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just leave this as-is and always assign. The debug info isn't useful, I don't think.

"threshold": 1,
"terminating": False,
"target_paths": []}
ordered_roles.append(role)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role is already appended at the end of the loop, outside of the conditional. I think we're adding the role to ordered_roles twice?

Suggested change
ordered_roles.append(role)

Comment on lines 2686 to 2688
"keyids": keyids,
"threshold": 1,
"terminating": False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not required? We're setting these defaults later when iterating over ordered_roles

'version':1,
'threshold': 1,
'terminating': False,
'paths': []}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are an empty paths and the version required? If so we should add them for non-succinct hbd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, we don't appear to be doing anything with this roleinfo dict? Let's just delete this else clause?

@mnm678
Copy link
Contributor Author

mnm678 commented Sep 23, 2020

I have a few questions and comments in line. Most significantly I wonder if the TAP requires that we change the way we name hbd. Could we support succinct hashed bin delegations with the hash-prefix range as calculated by the current code as the rolename suffix? That way we could harmonise our hashed bin code and not introduce a backwards incompatible change to repository_tool?

Thanks for the review @joshuagl! The main downside I see to using the current naming scheme is that it could make very long filenames if the prefix is sufficiently long. For example rolename-00000000-00000001, etc. Either way I can see the advantage to having at least an initial implementation that is backwards compatible.

mnm678 and others added 3 commits September 23, 2020 09:56
Co-authored-by: Joshua Lock <[email protected]>
Signed-off-by: marinamoore <[email protected]>
Make sure that docstrings match the functions

Signed-off-by: marinamoore <[email protected]>
Remove unnecessary or duplicate code in response to
feedback on the pr.

This also removes the existance check in add_delegated_role
so that the value passed will replace any existing delegation.
This allows the caller to replace an outdated role using
this function.

Signed-off-by: marinamoore <[email protected]>
@mnm678 mnm678 force-pushed the succinct-hash-delegations branch from b1d22cc to 589840e Compare September 23, 2020 16:56
This commit creates a 'succinct_hash_delegations' field that contains
'delegation_hash_prefix_len' and 'bin_name_prefix' to be used for
succinct hashed bin delegations. When this field is present, the
'name' field is no longer needed, and so is removed from the metadata.

For more detail, see the succinct hashed bin delegation TAP pr.

Signed-off-by: marinamoore <[email protected]>
for delegated_role in roleinfo['delegations']['roles']:
delegated_roles.append(delegated_role['name'])
if 'succinct_hash_delegations' in delegated_role:
delegated_roles.append(delegated_role['succinct_hash_delegations']['bin_name_prefix'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the prefix enough here, or do we need the full role name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the expected behavior here would be. A single succinct delegation in the metadata corresponds to many actual delegations, so should this function return names of the delegated roles in the metadata file (what it currently does), or the names of all roles delegated to by the current role (including the full names of all succinct delegations)?

Either way, I'll be sure to document it in the docstring.

@joshuagl joshuagl changed the title Succinct hashed bin delegations PoC(tap): Succinct hashed bin delegations Nov 26, 2020
@joshuagl joshuagl marked this pull request as draft November 26, 2020 16:28
@joshuagl
Copy link
Member

As this is a proof of concept for a TAP I've converted it to a draft PR.

@lukpueh
Copy link
Member

lukpueh commented Feb 17, 2022

Now with the legacy code gone this PR will need quite a re-write.

Most of above comments were either relevant to the legacy code, or to the TAP itself, and have been addressed in the taps repo. So I suggest to close here and re-open a fresh PR.

For the repository side, since we currently don't have a repository tool, it's probably best to prepare something like the hashed bin delegation example, and the client-side code should work with ngclient.

@lukpueh lukpueh closed this Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants