-
Notifications
You must be signed in to change notification settings - Fork 577
fix(instrumentation-undici): fix several header handling handling bugs #2781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(instrumentation-undici): fix several header handling handling bugs #2781
Conversation
The handling for User-Agent had a bunch of bugs, most notably with the handling of multiple-valued headers, which are rare but legal. Code similar to the added test "another header with multiple values" caused us errors in production from the user-agent code, and is where I started. Reading the code, writing tests, and improving the types revealed several more bugs in the same code as well as the span-to-attributes handling; the added tests "multiple user-agents", "another header with value user-agent" also fail before this PR (the others are just to actually cover the ordinary cases. Similarly, I also fixed an incorrect case in undici v5 where it would treat a `user-agent-bogus` header as a user agent, but I couldn't write a test case since the tests run with the newer version.
(Working on the CLA with my company, should have that fairly soon.) |
continue; | ||
} | ||
const key = line.substring(0, colonIndex); | ||
const value = line.substring(0, colonIndex + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this capture the key again since you're starting from the zeroth index of the line
string again? Could be something like: const value = line.substring(colonIndex + 1).trim();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek, you're right!
Which makes me think -- do we have a way to actually unit test the patches against older library versions? I guess that would have to be a separate test package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek, you're right!
Which makes me think -- do we have a way to actually unit test the patches against older library versions? I guess that would have to be a separate test package?
The test-all- versions
script should perform these tests. You can inspect which versions are tested in the .tav.yml
file at the root of this instrumentation.
Hi @benjaminjkraft, thanks for contributing to opentelemetry and welcome :) From your message I couldn't understand what are exactly the errors this PR fixes. I see you mention:
I'll check your changes assuming both points above. Please let me know if there is something else this PR is fixing soit can be mentioned in the changelog. Cheers |
if (key.toLowerCase() === 'user-agent') { | ||
// user-agent should only appear once per the spec, but the library doesn't | ||
// prevent passing it multiple times, so we handle that to be safe. | ||
const userAgent = Array.isArray(value) ? value[0] : value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One may wonder if the 1st occurrence of the UA header value may be the best option but I guess there is no best option in this scenario. If the request not complies the spec and sets multiple values for UA it may be good to log this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could log it, although I dunno where that goes in practice? We would probably also be spec-compliant-ish to do value.join(", ")
.
// prevent passing it multiple times, so we handle that to be safe. | ||
const userAgent = Array.isArray(value) ? value[0] : value; | ||
attributes[SemanticAttributes.USER_AGENT_ORIGINAL] = userAgent; | ||
return true; // no need to keep iterating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IMO forEachXXX
name implies we're executing for all items in the collection. Depending on a previous result to decide to continue the loop does not follow the semantics of the name. Without the comment after the return statement one not familiar with the code will not know that the loop breaks.
The control of the format and access to the headers in a separate function is a good idea but maybe it will serve better if it returns an iterable object. This way the consumer code can loop through it (and break the loop) with the language specifics like for..of and break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I can do an iterable, I wasn't totally sure if that was kosher yet for the lib but I guess they're supported pretty much everywhere these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, for a more explicit list of the issues this fixes:
- (undici v6+) the code crashed if any header (preceding
user-agent
) is multi-valued - (undici v6+) the code used the wrong value if a header value preceding the
user-agent
header was"user-agent"
(it would log the next header-key as the user-agent) - (undici v5) the code would treat any header starting with
user-agent
, e.g.user-agent-bogus
as theuser-agent
(CLA is in review, I'll get back to this later this week once it's approved.)
if (key.toLowerCase() === 'user-agent') { | ||
// user-agent should only appear once per the spec, but the library doesn't | ||
// prevent passing it multiple times, so we handle that to be safe. | ||
const userAgent = Array.isArray(value) ? value[0] : value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could log it, although I dunno where that goes in practice? We would probably also be spec-compliant-ish to do value.join(", ")
.
// prevent passing it multiple times, so we handle that to be safe. | ||
const userAgent = Array.isArray(value) ? value[0] : value; | ||
attributes[SemanticAttributes.USER_AGENT_ORIGINAL] = userAgent; | ||
return true; // no need to keep iterating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I can do an iterable, I wasn't totally sure if that was kosher yet for the lib but I guess they're supported pretty much everywhere these days.
continue; | ||
} | ||
const key = line.substring(0, colonIndex); | ||
const value = line.substring(0, colonIndex + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eek, you're right!
Which makes me think -- do we have a way to actually unit test the patches against older library versions? I guess that would have to be a separate test package?
Thanks for the reviews -- pushed updates to fix the v5 typo and swap to generators (which indeed look cleaner). (Also, CLA is now signed.) Let me know if there's anything else you'd like to see! |
You can use the script You have a compile issue :( |
@benjaminjkraft did you have time to look at the compile issue? Do you need any kind of assistance? |
@benjaminjkraft are you able to finish this PR? Let me know if you need me to jump in :) |
Which problem is this PR solving?
The handling for User-Agent had a bunch of bugs, most notably with the handling of multiple-valued headers, which are rare but legal.
Code similar to the added test "another header with multiple values" caused us errors in production from the user-agent code, and is where I started. Reading the code, writing tests, and improving the types revealed several more bugs in the same code as well as the span-to-attributes handling; the added tests "multiple user-agents", "another header with value user-agent" also fail before this PR (the others are just to actually cover the ordinary cases.
Similarly, I also fixed an incorrect case in undici v5 where it would treat a
user-agent-bogus
header as a user agent, but I couldn't write a test case since the tests run with the newer version.Short description of the changes
The relevant code is rewritten to handle multiple headers consistently, and refactored so we don't have to write it all twice.