Skip to content

fix(instrumentation-undici): fix several header handling handling bugs #2781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

benjaminjkraft
Copy link

Which problem is this PR solving?

The handling for User-Agent had a bunch of bugs, most notably with the handling of multiple-valued headers, which are rare but legal.

Code similar to the added test "another header with multiple values" caused us errors in production from the user-agent code, and is where I started. Reading the code, writing tests, and improving the types revealed several more bugs in the same code as well as the span-to-attributes handling; the added tests "multiple user-agents", "another header with value user-agent" also fail before this PR (the others are just to actually cover the ordinary cases.

Similarly, I also fixed an incorrect case in undici v5 where it would treat a user-agent-bogus header as a user agent, but I couldn't write a test case since the tests run with the newer version.

Short description of the changes

The relevant code is rewritten to handle multiple headers consistently, and refactored so we don't have to write it all twice.

The handling for User-Agent had a bunch of bugs, most notably with the handling of multiple-valued headers, which are rare but legal.

Code similar to the added test "another header with multiple values" caused us errors in production from the user-agent code, and is where I started. Reading the code, writing tests, and improving the types revealed several more bugs in the same code as well as the span-to-attributes handling; the added tests "multiple user-agents", "another header with value user-agent" also fail before this PR (the others are just to actually cover the ordinary cases.

Similarly, I also fixed an incorrect case in undici v5 where it would treat a `user-agent-bogus` header as a user agent, but I couldn't write a test case since the tests run with the newer version.
@benjaminjkraft benjaminjkraft requested a review from a team as a code owner April 3, 2025 19:50
Copy link

linux-foundation-easycla bot commented Apr 3, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@benjaminjkraft
Copy link
Author

(Working on the CLA with my company, should have that fairly soon.)

continue;
}
const key = line.substring(0, colonIndex);
const value = line.substring(0, colonIndex + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this capture the key again since you're starting from the zeroth index of the line string again? Could be something like: const value = line.substring(colonIndex + 1).trim();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek, you're right!

Which makes me think -- do we have a way to actually unit test the patches against older library versions? I guess that would have to be a separate test package?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek, you're right!

Which makes me think -- do we have a way to actually unit test the patches against older library versions? I guess that would have to be a separate test package?

The test-all- versions script should perform these tests. You can inspect which versions are tested in the .tav.yml file at the root of this instrumentation.

@david-luna
Copy link
Contributor

Hi @benjaminjkraft,

thanks for contributing to opentelemetry and welcome :)

From your message I couldn't understand what are exactly the errors this PR fixes. I see you mention:

  • header values provided as an array of strings (multiple-valued headers)
  • headers prefixed by user-agent may be considered the user agent wrongly.

I'll check your changes assuming both points above. Please let me know if there is something else this PR is fixing soit can be mentioned in the changelog.

Cheers

if (key.toLowerCase() === 'user-agent') {
// user-agent should only appear once per the spec, but the library doesn't
// prevent passing it multiple times, so we handle that to be safe.
const userAgent = Array.isArray(value) ? value[0] : value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One may wonder if the 1st occurrence of the UA header value may be the best option but I guess there is no best option in this scenario. If the request not complies the spec and sets multiple values for UA it may be good to log this issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could log it, although I dunno where that goes in practice? We would probably also be spec-compliant-ish to do value.join(", ").

// prevent passing it multiple times, so we handle that to be safe.
const userAgent = Array.isArray(value) ? value[0] : value;
attributes[SemanticAttributes.USER_AGENT_ORIGINAL] = userAgent;
return true; // no need to keep iterating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IMO forEachXXX name implies we're executing for all items in the collection. Depending on a previous result to decide to continue the loop does not follow the semantics of the name. Without the comment after the return statement one not familiar with the code will not know that the loop breaks.

The control of the format and access to the headers in a separate function is a good idea but maybe it will serve better if it returns an iterable object. This way the consumer code can loop through it (and break the loop) with the language specifics like for..of and break

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I can do an iterable, I wasn't totally sure if that was kosher yet for the lib but I guess they're supported pretty much everywhere these days.

Copy link
Author

@benjaminjkraft benjaminjkraft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for a more explicit list of the issues this fixes:

  • (undici v6+) the code crashed if any header (preceding user-agent) is multi-valued
  • (undici v6+) the code used the wrong value if a header value preceding the user-agent header was "user-agent" (it would log the next header-key as the user-agent)
  • (undici v5) the code would treat any header starting with user-agent, e.g. user-agent-bogus as the user-agent

(CLA is in review, I'll get back to this later this week once it's approved.)

if (key.toLowerCase() === 'user-agent') {
// user-agent should only appear once per the spec, but the library doesn't
// prevent passing it multiple times, so we handle that to be safe.
const userAgent = Array.isArray(value) ? value[0] : value;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could log it, although I dunno where that goes in practice? We would probably also be spec-compliant-ish to do value.join(", ").

// prevent passing it multiple times, so we handle that to be safe.
const userAgent = Array.isArray(value) ? value[0] : value;
attributes[SemanticAttributes.USER_AGENT_ORIGINAL] = userAgent;
return true; // no need to keep iterating
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I can do an iterable, I wasn't totally sure if that was kosher yet for the lib but I guess they're supported pretty much everywhere these days.

continue;
}
const key = line.substring(0, colonIndex);
const value = line.substring(0, colonIndex + 1);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek, you're right!

Which makes me think -- do we have a way to actually unit test the patches against older library versions? I guess that would have to be a separate test package?

@benjaminjkraft
Copy link
Author

Thanks for the reviews -- pushed updates to fix the v5 typo and swap to generators (which indeed look cleaner). (Also, CLA is now signed.) Let me know if there's anything else you'd like to see!

@david-luna
Copy link
Contributor

Which makes me think -- do we have a way to actually unit test the patches against older library versions? I guess that would have to be a separate test package?

You can use the script npm run test-all-versions which will install and test with different versions of undici. You could even update .tav.yml file since Node.js v14 and v16 support was dropped in #2738

You have a compile issue :(
https://github.com/open-telemetry/opentelemetry-js-contrib/actions/runs/14367208545/job/40308049357?pr=2781

@david-luna
Copy link
Contributor

@benjaminjkraft did you have time to look at the compile issue? Do you need any kind of assistance?

@david-luna
Copy link
Contributor

@benjaminjkraft are you able to finish this PR? Let me know if you need me to jump in :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants