-
Notifications
You must be signed in to change notification settings - Fork 191
treat sync1.1 errors as warnings #989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good, and a couple good little cleanups along the way.
I flagged a couple places we probably don't need warn-versus-error: #sync
events aren't emitted by pre-sync1.1 software at all, and DID syntax in #commit
events can always be enforced strictly (not a sync1.1 change)
cmd/relay/bgs/validator.go
Outdated
|
||
did, err := syntax.ParseDID(msg.Did) | ||
if err != nil { | ||
syncVerifyErrors.WithLabelValues(hostname, "did").Inc() | ||
return nil, err | ||
if !val.Sync11ErrorsAreWarnings { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all these #sync
event checks could be errors-not-warnings. The whole event is new in sync v1.1, so there shouldn't be any problem with old implementations outputting events which don't validate. It is only the #commit
events we need to handle with warning-not-error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where I thought the discussion went yesterday, that we have to allow mediocre implementation of sync1.1 for a while
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main point is to allow PDS instances which don't implement Sync v1.1 at all, not ones which attempt to implement it but mess up.
aka, we want code which was running and working in December 2024 to keep running for a while longer. but code which was implemented last week and implemented sync1.1 wrong can error.
d604306
to
e5f3163
Compare
cmd/relay/bgs/validator.go
Outdated
commit, err := atrepo.LoadCARCommit(ctx, bytes.NewReader([]byte(msg.Blocks))) | ||
if err != nil { | ||
commitVerifyErrors.WithLabelValues(hostname, "car").Inc() | ||
return nil, err | ||
if !val.Sync11ErrorsAreWarnings { | ||
return nil, err | ||
} else { | ||
logger.Warn("invalid car", "err", err) | ||
} | ||
} | ||
|
||
if commit.Rev != rev.String() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there was an error in loading the CAR, I would expect commit
to be nil
. If that's the case, falling through on the "invalid car" warn would cause a nil pointer exception here checking the rev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed (and in commit handler)
cmd/relay/bgs/validator.go
Outdated
commit, err := atrepo.LoadCARCommit(ctx, bytes.NewReader([]byte(msg.Blocks))) | ||
if err != nil { | ||
commitVerifyErrors.WithLabelValues(hostname, "car").Inc() | ||
if !val.Sync11ErrorsAreWarnings { | ||
if !val.Sync11ErrorsAreWarnings || commit == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that if err != nil
that the commit
is going to be nil
. Is that not the case? I think that means we always have to error if we fail to read the commit from the CAR, there's no case where we'll end-up warning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, yeah, content and err are mutually exclusive, cleaned up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a few more instances where a check might error, resulting in nil / zero-like values.
especially for the MST inversion case, I think the easiest fix is to split code out to a sub-method which fails or succeeds all together, and then warn-or-error on that overall result.
cmd/relay/bgs/validator.go
Outdated
} | ||
if commit.DID != did.String() { | ||
commitVerifyErrors.WithLabelValues(hostname, "did2").Inc() | ||
return nil, fmt.Errorf("rev did not match commit") | ||
if !val.Sync11ErrorsAreWarnings { | ||
return nil, fmt.Errorf("rev did not match commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error message should be about DID, not about rev: "message DID and commit DID differed" or something like that
cmd/relay/bgs/validator.go
Outdated
if !val.Sync11ErrorsAreWarnings { | ||
return nil, fmt.Errorf("invalid repo path in ops list: %w", err) | ||
} else { | ||
logger.Warn("invalid repo path", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to some of the other branches: if there is a ParseRepoPath
error, then nsid
and rkey
will be empty strings, and the code path below won't work.
I think this could just always be an error: there should be zero or very-near-zero commits floating around the network which would fail this check; not a "Sync v1.1" change.
cmd/relay/bgs/validator.go
Outdated
if !val.Sync11ErrorsAreWarnings { | ||
return nil, err | ||
} else { | ||
logger.Warn("invalid record cid", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the warning message should be something like "record CID not found in CAR diff" (not "invalid record cid")
cmd/relay/bgs/validator.go
Outdated
if !val.Sync11ErrorsAreWarnings { | ||
return nil, err | ||
} else { | ||
logger.Warn("invalid commit ops", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as with some of the other code branches, if ParseCommitOps
fails, I assume that ops
will be nil/empty, and the checks below (like NormalizeOps
, and inversion in general) won't make sense.
maybe one way to make this handling simpler would be a "VerifyCommitMessageOps" sub-method. the "Sync11ErrorsAreWarnings" check can be implemented based on the output of that overall function, instead of for each step which might fail.
cmd/relay/bgs/validator.go
Outdated
|
||
did, err := syntax.ParseDID(msg.Did) | ||
if err != nil { | ||
syncVerifyErrors.WithLabelValues(hostname, "did").Inc() | ||
return nil, err | ||
if !val.Sync11ErrorsAreWarnings { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main point is to allow PDS instances which don't implement Sync v1.1 at all, not ones which attempt to implement it but mess up.
aka, we want code which was running and working in December 2024 to keep running for a while longer. but code which was implemented last week and implemented sync1.1 wrong can error.
return nil, err | ||
} else { | ||
logger.Warn("invalid rev", "err", err) | ||
} | ||
} | ||
if rev.Time().After(time.Now().Add(val.maxRevFuture)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rev
might be zero-like (empty string) if syntax check above failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in HandleCommit I'd changed it to assume valid DID and rev TID are basic; propagated that here too
if val.Sync11ErrorsAreWarnings { | ||
logger.Warn("invalid car", "err", err) | ||
} | ||
return nil, fmt.Errorf("invalid car, %w", err) | ||
} | ||
|
||
if commit.Rev != rev.String() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit
will be zero-like here if LoadCARCommit
failed above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's okay because we do always error out on that error
return nil, fmt.Errorf("rev did not match commit") | ||
} else { | ||
logger.Warn("message rev != commit.rev") | ||
} | ||
} | ||
if commit.DID != did.String() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit
will be zero-like here if LoadCARCommit
failed above
mostly reorg a couple chains of actions into functions so bail out is easier
ok, I think I cleaned up all the places where we should bail on a sequence of steps if there's an error, bundled things up into two new inner functions to organize that. |
No description provided.