Skip to content

gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative #92900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 15, 2025

Conversation

dlenski
Copy link
Contributor

@dlenski dlenski commented May 17, 2022

This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:

typing.List[typing.Tuple[str, None]], of length exactly 1
or typing.List[typing.Tuple[bytes, typing.Optional[str]]]

This function can't be rewritten to be more consistent in a backwards-compatible way, because some users of this function depend on the existing return type(s).

This PR addresses the inconsistencies:

  1. as suggested by @JelleZijlstra in The decode_header() function decodes raw part to bytes or str, depending on encoded part #67022 (comment):

    we should document the surprising return type at https://docs.python.org/3.10/library/email.header.html.

  2. by suggesting email.headerregistry.HeaderRegistry as a replacement, per gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative #92900

Example of the old/inconsistent (decode_header) vs. modern/sane approaches:

>>> from email import decode_header
>>> from email.headerregistry import HeaderRegistry
>>>
>>> # decode_header exposes differences in sub-encodings AND a str/bytes inconsistency in return type:
>>> print(decode_header('hello foo bar')
 [('hello foo bar', None)])
>>> print(decode_header('hello =?utf-8?B?ZsOzbw==?= bar'))
[(b'hello ', None), (b'f\xc3\xb3o', 'utf-8'), (b' bar', None)]
>>> print(decode_header('=?iso-8859-1?q?hello_f=F3o_bar?='))
[(b'hello f\xf3o bar', 'iso-8859-1')]
>>>
>>> # HeaderRegistry has a sane and consistent interface:
>>> decoder = HeaderRegistry()
>>> decoder('Subject', 'hello foo bar')
'hello foo bar'
>>> decoder('Subject', 'hello =?utf-8?B?ZsOzbw==?= bar')
'hello fóo bar'
>>> decoder('Subject', '=?iso-8859-1?q?hello_f=F3o_bar?=')
'hello fóo bar'

(Closes #30548 and replaces it.)

@dlenski dlenski requested a review from a team as a code owner May 17, 2022 20:52
@ghost
Copy link

ghost commented May 17, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

Copy link
Member

@warsaw warsaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think this would help users of the legacy API, although I think we should also steer people to the new API. What does @bitdancer think?

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@dlenski
Copy link
Contributor Author

dlenski commented Jul 20, 2022

I have made the requested changes; please review again, @warsaw.

And if you don't make the requested changes, you will be put in the comfy chair!

😂

@bedevere-bot
Copy link

Thanks for making the requested changes!

@warsaw: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from warsaw July 20, 2022 21:24
@dlenski
Copy link
Contributor Author

dlenski commented Feb 21, 2023

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@warsaw: please review the changes made to this pull request.

dlenski added a commit to dlenski/cpython that referenced this pull request Jun 11, 2025
…email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
@dlenski dlenski changed the title gh-67022: Document bytes/str inconsistency in email.header.decode_header() and add .decode_header_to_string() as a sane alternative gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative Jun 11, 2025
dlenski added a commit to dlenski/cpython that referenced this pull request Jun 11, 2025
…email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
dlenski added a commit to dlenski/cpython that referenced this pull request Jun 11, 2025
…email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
dlenski added a commit to dlenski/cpython that referenced this pull request Jun 11, 2025
…email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
dlenski added a commit to dlenski/cpython that referenced this pull request Jun 11, 2025
…email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
@bedevere-app bedevere-app bot requested a review from bitdancer June 14, 2025 04:48
dlenski added a commit to dlenski/cpython that referenced this pull request Jun 14, 2025
…email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
dlenski added a commit to dlenski/cpython that referenced this pull request Jun 14, 2025
Per python#92900 (comment), not
wanted for doc-only PRs.
dlenski added 3 commits June 13, 2025 22:09
…de_header()

This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:

1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1
2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1

This means that any user of this function must be prepared to accept either
`bytes` or `str` for the first member of the 2-tuples it returns, which is a
very surprising behavior in Python 3.x, particularly given that the second
member of the tuple is supposed to represent the charset/encoding of the
first member.

This patch documents the behavior of this function, and adds test cases
to demonstrate it.

As discussed in bpo-22833, this cannot be changed in a backwards-compatible
way, and some users of this function depend precisely on the existing
behavior.
…email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
Per python#92900 (comment), not
wanted for doc-only PRs.
Copy link
Member

@bitdancer bitdancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bitdancer bitdancer merged commit 60181f4 into python:main Jun 15, 2025
42 of 43 checks passed
@bitdancer bitdancer added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Jun 15, 2025
@miss-islington-app
Copy link

Thanks @dlenski for the PR, and @bitdancer for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@miss-islington-app
Copy link

Thanks @dlenski for the PR, and @bitdancer for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 15, 2025
…de_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (pythonGH-92900)

* pythongh-67022: Document bytes/str inconsistency in email.header.decode_header()

This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:

1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1
2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1

This means that any user of this function must be prepared to accept either
`bytes` or `str` for the first member of the 2-tuples it returns, which is a
very surprising behavior in Python 3.x, particularly given that the second
member of the tuple is supposed to represent the charset/encoding of the
first member.

This patch documents the behavior of this function, and adds test cases
to demonstrate it.

As discussed in bpo-22833, this cannot be changed in a backwards-compatible
way, and some users of this function depend precisely on the existing
behavior.

Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
(cherry picked from commit 60181f4)

Co-authored-by: Dan Lenski <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Jun 15, 2025

GH-135548 is a backport of this pull request to the 3.14 branch.

@bedevere-app
Copy link

bedevere-app bot commented Jun 15, 2025

GH-135549 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jun 15, 2025
bitdancer pushed a commit that referenced this pull request Jun 15, 2025
…ode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900) (#135548)

gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900)

* gh-67022: Document bytes/str inconsistency in email.header.decode_header()

This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:

1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1
2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1

This means that any user of this function must be prepared to accept either
`bytes` or `str` for the first member of the 2-tuples it returns, which is a
very surprising behavior in Python 3.x, particularly given that the second
member of the tuple is supposed to represent the charset/encoding of the
first member.

This patch documents the behavior of this function, and adds test cases
to demonstrate it.

As discussed in bpo-22833, this cannot be changed in a backwards-compatible
way, and some users of this function depend precisely on the existing
behavior.

Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in #92900 (comment)
(cherry picked from commit 60181f4)

Co-authored-by: Dan Lenski <[email protected]>
bitdancer pushed a commit that referenced this pull request Jun 15, 2025
…ode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900) (#135549)

gh-67022: Document bytes/str inconsistency in email.header.decode_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (GH-92900)

* gh-67022: Document bytes/str inconsistency in email.header.decode_header()

This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:

1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1
2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1

This means that any user of this function must be prepared to accept either
`bytes` or `str` for the first member of the 2-tuples it returns, which is a
very surprising behavior in Python 3.x, particularly given that the second
member of the tuple is supposed to represent the charset/encoding of the
first member.

This patch documents the behavior of this function, and adds test cases
to demonstrate it.

As discussed in bpo-22833, this cannot be changed in a backwards-compatible
way, and some users of this function depend precisely on the existing
behavior.

Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in #92900 (comment)
(cherry picked from commit 60181f4)

Co-authored-by: Dan Lenski <[email protected]>
shuimu5418 pushed a commit to shuimu5418/cpython001 that referenced this pull request Jun 16, 2025
…de_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (python#92900)

* pythongh-67022: Document bytes/str inconsistency in email.header.decode_header()

This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:

1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1
2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1

This means that any user of this function must be prepared to accept either
`bytes` or `str` for the first member of the 2-tuples it returns, which is a
very surprising behavior in Python 3.x, particularly given that the second
member of the tuple is supposed to represent the charset/encoding of the
first member.

This patch documents the behavior of this function, and adds test cases
to demonstrate it.

As discussed in bpo-22833, this cannot be changed in a backwards-compatible
way, and some users of this function depend precisely on the existing
behavior.

Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
lkollar pushed a commit to lkollar/cpython that referenced this pull request Jun 19, 2025
…de_header() and suggest email.headerregistry.HeaderRegistry as a sane alternative (python#92900)

* pythongh-67022: Document bytes/str inconsistency in email.header.decode_header()

This function's possible return types have been surprising and error-prone
for the entirety of its Python 3.x history. It can return either:

1. `typing.List[typing.Tuple[bytes, typing.Optional[str]]]` of length >1
2. or `typing.List[typing.Tuple[str, None]]`, of length exactly 1

This means that any user of this function must be prepared to accept either
`bytes` or `str` for the first member of the 2-tuples it returns, which is a
very surprising behavior in Python 3.x, particularly given that the second
member of the tuple is supposed to represent the charset/encoding of the
first member.

This patch documents the behavior of this function, and adds test cases
to demonstrate it.

As discussed in bpo-22833, this cannot be changed in a backwards-compatible
way, and some users of this function depend precisely on the existing
behavior.

Add warnings about obsolescence of 'email.header.decode_header' and 'email.header.make_header' functions.

Recommend use of `email.headerregistry.HeaderRegistry` instead, as suggested
in python#92900 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants