Skip to content

Switch license metadata to the PEP 639 format #13335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SpecLad
Copy link
Contributor

@SpecLad SpecLad commented Apr 15, 2025

Also, include all license files for the vendored dependencies inside the wheel, and in the License-File package metadata field.

License files are included in distributions automatically, so remove them from MANIFEST.in.

@ichard26
Copy link
Member

While I'm generally in favour dogfooding the latest standards, I'm wary of making our setuptools requirement so tight. Setuptools 77.0.1 was only released ~a month ago. This may cause some issues for our redistributors who want to build pip from source. Perhaps we can merge this for the pip 25.2 release in July?

@SpecLad
Copy link
Contributor Author

SpecLad commented Apr 15, 2025

While I'm generally in favour dogfooding the latest standards, I'm wary of making our setuptools requirement so tight. Setuptools 77.0.1 was only released ~a month ago. This may cause some issues for our redistributors who want to build pip from source. Perhaps we can merge this for the pip 25.2 release in July?

Sure, I don't mind. Just ping me if you need this rebased.

@pfmoore
Copy link
Member

pfmoore commented Apr 15, 2025

Agreed. Let's leave this to 25.2, at least.

Also, I'm not in favour reporting pip's license as that complex expression. I don't know what legal implications there may be, but I consider pip's license to be MIT, and that's what we should report it as. The licenses of our vendored dependencies (in my view, at least) are just that - licenses of dependencies. Vendoring is simply a distribution method, and shouldn't affect our license.

I'm happy for someone with more legal expertise than me to explain the implications, but as a maintainer, what I want people to see when they look at (for example) PyPI, is that we license pip under the MIT license.

@ichard26 ichard26 added this to the 25.2 milestone Apr 15, 2025
@SpecLad
Copy link
Contributor Author

SpecLad commented Apr 15, 2025

Vendoring is simply a distribution method, and shouldn't affect our license.

I don't really have a horse in this race, so I can set the license expression to whatever the maintainer consensus is. However, I do believe it would be correct to include the vendored licenses. License-Expression is a project-wide setting, and there are files in the pip project that are covered by all of these different licenses. I don't see how one could make a case that the license of, say, pip/_internal/cache.py "counts", but the license of pip/_vendor/packaging/markers.py does not, when both of these files are in the sdist/wheel.

In addition, PEP 639 explicitly calls out vendored dependencies as a case where multiple licenses need to be specified:

The current license classifiers could be extended to include the full range of the SPDX identifiers while deprecating the ambiguous classifiers (such as License :: OSI Approved :: BSD License).

However, there are multiple arguments against such an approach:

  • [...]
  • It only covers packages under a single license; it doesn’t address projects that vendor dependencies (e.g. Setuptools), offer a choice of licenses (e.g. Packaging) or were relicensed, adapt code from other projects or contain fonts, images, examples, binaries or other assets under other licenses.

(although sadly it does not offer specific advice to such projects)

pyproject.toml Outdated
Comment on lines 8 to 17
# Apache-2.0 OR BSD-2-Clause: packaging
# Apache-2.0: cachecontrol, distro, msgpack, requests
# BSD-2-Clause: pygments
# BSD-3-Clause: idna
# ISC: resolvelib
# MIT: dependency-groups, pip, platformdirs, pyproject-hooks, rich, setuptools,
# urllib3, tomli, truststore
# MPL-2.0: certifi
# PSF-2.0: distlib, typing-extensions
license = "Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND ISC AND MIT AND MPL-2.0 AND PSF-2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocking concern but I think we should have some automated validation of this, because I believe that someone will forget to update this when we update vendored libraries if we don't validate this somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be too hard to, say:

  1. Put a text file next to vendor.txt that maps each library to its license expression.
  2. Add a nox session that verifies that
    1. each library has a corresponding expression, and
    2. the license expression in pyproject.toml is a combination of MIT and all the expressions of the vendored libraries.

However, I'm not doing that unless the team agrees that the complex license expression should be used to begin with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to either (a) inline that metadata as comments on vendor.txt or (b) have that metadata be inferred from the license metadata of the packages themselves.

@pfmoore
Copy link
Member

pfmoore commented Apr 19, 2025

I don't see how one could make a case that the license of, say, pip/_internal/cache.py "counts", but the license of pip/_vendor/packaging/markers.py does not, when both of these files are in the sdist/wheel.

Conversely, I don't see why pip/_vendor/packaging/markers.py affects pip's license, when the license of certifi doesn't affect the license of requests. Packaging is a dependency of pip and certifi is a dependency of requests. Yes, pip vendors packaging, as a convenience for our users, but that doesn't mean that packaging is part of pip (and indeed, Linux distros devendor pip's vendored dependencies - would they have to remove the extra clauses from pip's license as well?)

I stand by my preference - I'd like License-Expression to just say "MIT".

@pradyunsg
Copy link
Member

pradyunsg commented Apr 19, 2025

The difference is that we're redistributing the code -- requests is not redistributing certifi by vendoring it.

We effectively have "different source files each governed by different licenses" within our distribution files (using language from the SPDX license expression docs, where they describe the rationale for having an AND operator) -- which is what this PR makes our license metadata capture more clearly.

We do this implicitly by including license files for each vendored package and this PR makes that explicit.

@pradyunsg
Copy link
Member

Linux distros devendor pip's vendored dependencies

AFAIK, no Linux distribution is devendoring pip other than Gentoo.

@pfmoore
Copy link
Member

pfmoore commented Apr 19, 2025

Let me ask the question another way. Where is the license of pip, specifically, supposed to be recorded? We have no intention of changing pip’s license from MIT - so how do we record that information in the metadata?

@pradyunsg
Copy link
Member

pradyunsg commented Apr 19, 2025

Where is the license of pip, specifically, supposed to be recorded?

It's the top-level LICENSE file, so implied via that. Otherwise, we can also do something like:

# Licenses for vendored code are listed in parenthesis
license = "MIT AND (Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND ISC AND MIT AND MPL-2.0 AND PSF-2.0)"

AFAIK, it's valid to have MIT AND (... MIT ...) in an SPDX license expression.

We have no intention of changing pip’s license from MIT - so how do we record that information in the metadata?

We don't record this today, and I don't see why we need to be trying to record this. I'll flip the question: Why do you want this, and how does the current license scheme cover that need?

@pradyunsg
Copy link
Member

pradyunsg commented Apr 19, 2025

Bah, I need to make an edit -- hold please. Done. I had posted with an eager enter. I've lost muscle memory for GitHub comments. 🙈

@SpecLad
Copy link
Contributor Author

SpecLad commented Apr 19, 2025

FWIW, if there's a desire to unambiguously label the license of specifically pip's own files, then there is a solution for that: you could add an # SPDX-License-Identifier: MIT comment to every source file.

@pfmoore
Copy link
Member

pfmoore commented Apr 19, 2025

We don't record this today

We do. In the classifiers. And that's what has historically been displayed on PyPI.

I'll flip the question: Why do you want this, and how does the current license scheme cover that need?

I want this because I want it to be clear to everyone that pip is licensed under the MIT license. Prior to PEP 639, things were clear - we documented our license in the classifiers and in the License metadata, and we shipped the license files for our vendored dependencies. PyPI showed "License - MIT".

I don't see how PEP 639 changes things. We still ship our dependencies' license files (and we now record them along with our own in License-Files). We report our license (MIT) in the new License-Expression field, replacing the classifiers and the old License field. PyPI will still report "License - MIT". Why is this a problem?

The idea that we put some sort of combined license expression seems to have come from nowhere - the only reference to vendoring in PEP 639 was quoted above, but out of context. It was in relation to the question "Why can't we just extend the license classifiers and use them? And in that context, "Because classifiers don't handle vendoring well" is a perfectly good answer. The scheme in PEP 639 does handle vendoring, though1 - it has License-Expression to record the project license, and License-Files to record all license files relevant to the project, including those for vendored dependencies. That should be sufficient.

Let me ask another question. If this PR had simply changed license = {text = "MIT"} to license = "MIT", without further comment, would anyone have objected? Would we even be having a debate?

FWIW, if there's a desire to unambiguously label the license of specifically pip's own files, then there is a solution for that: you could add an # SPDX-License-Identifier: MIT comment to every source file.

There's no such desire - at least not from me. My desire is to retain the previous form, where we stated clearly that pip's overall license was MIT. I dislike putting license comments in every file. We'll inevitably forget one (pyproject.toml? vendor.txt? The documentation?) and it's unnecessary clutter, as long as we document the overall project license properly.

I'm sorry for making an issue of this. I hate licensing debates with a vengeance, and I would much rather we never had them. But more than just the debates, I hate the fact that those debates end up with people giving up on things that matter to them, just because it is too much effort to continue arguing. So with that in mind, it matters to me that we report License-Expression: MIT, and I will continue to argue for that (and block this PR) unless someone can demonstrate that there is a legal requirement that we report the expression currently in the PR.

Footnotes

  1. That's the point - the quoted comment demonstrates that the authors had thought about vendoring and rejected an approach that didn't handle it.

@pradyunsg
Copy link
Member

pradyunsg commented Apr 19, 2025

If this PR had simply changed license = {text = "MIT"} to license = "MIT", without further comment, would anyone have objected? Would we even be having a debate?

No. In that vein, I think it's better to split this into 2 PRs: one moving us to PEP 739 style license = "MIT" and another moving to more precise license = ... for the contents of the distribution files.

@SpecLad Would you be willing to do so?

I will continue to argue for that (and block this PR) unless someone can demonstrate that there is a legal requirement that we report the expression currently in the PR.

I don't think there's ever going to be a legal requirement to change this. I do think it'll help people doing the "simple" thing of tracking licenses by looking at metadata rather than the contents of our distribution get a better picture of what the license for the code contained within it really is.

@SpecLad
Copy link
Contributor Author

SpecLad commented Apr 19, 2025

@SpecLad Would you be willing to do so?

Certainly. I will update this PR to just focus on PEP 639.

@SpecLad
Copy link
Contributor Author

SpecLad commented Apr 19, 2025

Yes, pip vendors packaging, as a convenience for our users, but that doesn't mean that packaging is part of pip

FWIW, I would argue that it does mean that. Vendoring is just copying files from another project into your project—and once you've done it, those files are part of your project, same as any other.

As such, I don't think it really makes sense to speak of MIT as the "overall" license of pip - some files in the project are under the MIT license, and some aren't.

(and indeed, Linux distros devendor pip's vendored dependencies - would they have to remove the extra clauses from pip's license as well?)

Yes. It seems reasonable to me - removing files also frees you of license obligations imposed by those files.

@SpecLad SpecLad force-pushed the pep-639 branch 2 times, most recently from c07565b to 15664cc Compare April 19, 2025 23:02
@pradyunsg pradyunsg changed the title Make pip's licensing metadata more comprehensive Switch license metadata to the PEP 639 format Apr 19, 2025
pyproject.toml Outdated
Comment on lines 9 to 10
"AUTHORS.txt", "LICENSE.txt",
"src/pip/_vendor/**/*LICENSE*", "src/pip/_vendor/**/*COPYING*",
Copy link
Member

@pradyunsg pradyunsg Apr 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm being annoying pedantic thorough, I'll also ask for these to be changed into one-per-line to align with how we're formatting Python files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I'm a pedant myself. 🙂 Updated.

Copy link
Member

@ichard26 ichard26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared the sdist from main and this PR and there aren't any unexpected changes. As long as this goes in pip 25.2 and not pip 25.1, LGTM. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should probably be a process changelog entry, it's not at all a feature 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Fixed.

@notatallshaw
Copy link
Member

notatallshaw commented Apr 20, 2025

I do think it'll help people doing the "simple" thing of tracking licenses by looking at metadata rather than the contents of our distribution get a better picture of what the license for the code contained within it really is.

Right, I have an earnest question here:

Isn't the purpose of the license expression so that users can algorithmically identify what licenses a distribution requires to use?

If the answer is yes, then we would be breaking that purpose by only giving the MIT license and not the other licenses. Code in pip's source distribution is not covered by the MIT license only.

If the answer is no, what is the value of the license expression over the previous license field?

Apologies if this is naive, I didn't follow the license expression PEP closely, and I just assumed this was the purpose, because it's a genuine issue I face that I can't currently determine the licenses in a library distribution by it's metadata, and pip would be continuing this trend if it puts MIT only in the license expression.

@notatallshaw
Copy link
Member

notatallshaw commented Apr 20, 2025

Again, earnest but possibly naive analysis of the issues here:

If this PR had simply changed license = {text = "MIT"} to license = "MIT", without further comment, would anyone have objected? Would we even be having a debate?

Does this create a license expression core metadata field? Because if so the PEP's goal states:

This PEP’s scope is limited to covering new mechanisms for documenting the license of a distribution package,

And pip's distribution package is not covered by MIT only, so as I read it this would be an explicit violation of the PEP's main goal.

@ichard26
Copy link
Member

Does this create a license expression core metadata field?

Yes.

image

@pfmoore
Copy link
Member

pfmoore commented Apr 20, 2025

Certainly. I will update this PR to just focus on PEP 639.

Thanks. I'm happy with the updated PEP keeping the license as MIT. I agree with @ichard26 though, let's hold this off until 25.2.

As such, I don't think it really makes sense to speak of MIT as the "overall" license of pip - some files in the project are under the MIT license, and some aren't.

I think pip's license has always been stated as MIT, and all existing contributions were made on that basis (I know mine have been). I don't think we can change "pip's license" (whatever that might mean) without agreement from all past contributors, and that's frankly impractical.

I think there's possibly an important distinction between the project license, and the (cumulative) license of a distribution file that bundles vendored dependencies. It's not clear from PEP 639 which of those two the license field is meant to contain, and how the other one is supposed to be recorded when there's a difference. I'd argue that license is for the project license, because (a) it's in the [project] section of pyproject.toml, and (b) that's historically what the old license field was used for. Furthermore, PyPI displays the license data (from License previously, now from License-Expression) as the project license, reinforcing that interpretation:

image

I understand that you think otherwise, and that's the crux of the disagreement here.

IMO, what's critical is that PyPI displays "MIT" as the pip project license. That's what has always been displayed, it's what users will expect to see, and it's the only part of the licensing that's actually under the control of the pip maintainers directly1.

I appreciate that tools like license scanning software need to know the combined license expression for the distribution file and I'd support adding metadata to record that. But I don't support repurposing the existing license file for that use rather than its traditional use as the project's own license.

It's intensely frustrating that this wasn't picked up on when PEP 639 was being developed, but I think the correct action here is to fix PEP 639, not to mangle pip's license data. I'll post a request on Discourse for a fix in PEP 639, and I suggest we do nothing in pip (beyond switching to PEP 639 format as this PR now does) until that gets resolved.

Footnotes

  1. Yes, we could refuse to vendor a dependency of they changed the license to something we were unhappy about, but that's not direct control of the license.

@pfmoore
Copy link
Member

pfmoore commented Apr 20, 2025

OK, I've posted on Discourse here.

@pfmoore
Copy link
Member

pfmoore commented Apr 20, 2025

Isn't the purpose of the license expression so that users can algorithmically identify what licenses a distribution requires to use?

Maybe? That may be the purpose of the license expression, but it was never the purpose of the license metadata. If PEP 639 changed the intended purpose of the metadata, that was done without sufficient explanation, and without a full exploration of the implications.

If the answer is no, what is the value of the license expression over the previous license field?

Precision? Given that this is only an issue for the vanishingly rare number of projects which bundle dependencies, I think it's hard to draw meaningful conclusions here. Either answer (yes or no) misrepresents the answer to some question. IMO, retain the existing meaning for the existing field, and create a new field for "vendored dependency license expression(s)" is probably the only correct answer.

Apologies if this is naive, I didn't follow the license expression PEP closely, and I just assumed this was the purpose, because it's a genuine issue I face that I can't currently determine the licenses in a library distribution by it's metadata, and pip would be continuing this trend if it puts MIT only in the license expression.

Not a problem - the license metadata debate rambled over so many years (literally!) I don't think anyone really followed the discussion more than superficially. I appreciate your problem, but it feels like this is something that PEP 639 failed to cover, rather than something where the meaning of the license key was intentionally changed. Certainlly, I think that if it had been an intentional change, it wasn't explained clearly enough, as I would have asked "how do we record the project's chosen license" at the time 🙁

And pip's distribution package is not covered by MIT only, so as I read it this would be an explicit violation of the PEP's main goal.

But conversely, "limited to covering new mechanisms for documenting the license" reads very clearly to me as "changes how you write the data, but not what data you write". So as I read it, arguing that the license value should change because of PEP 639 is an explicit violation of the intent of the PEP.

@pfmoore
Copy link
Member

pfmoore commented Apr 20, 2025

An excellent suggestion made in the Discourse thread is to publish our vendored dependencies' license information via a SBOM, which could be automatically generated via the vendoring utility - there's a PR to add PEP 770 support already.

Also, include all license files for the vendored dependencies inside the wheel,
and in the `License-File` package metadata field.

License files are included in distributions automatically, so remove them from
`MANIFEST.in`.
@notatallshaw
Copy link
Member

notatallshaw commented Apr 21, 2025

I think the SBOM / PEP 770 solves the practical problem of providing clarity on the licenses of individual files within the pip source distribution.

But I don't think it solves whether the intent of PEP 639 is for the projects license(s) or the distribution license(s). Until the PEP authors have made that unambiguous I would be against switching to using the PEP 639 License-Expression core metadata field, regardless of whether pip provides the projects license or the distribution licenses downstream consumers may assume it means the other.

@pfmoore
Copy link
Member

pfmoore commented Apr 21, 2025

Until the PEP authors have made that unambiguous I would be against switching to using the PEP 639 License-Expression core metadata field

OK, let's wait until:

  1. As you said, we have clarification on the intended use of the License-Expression field.
  2. We have a confirmed way of recording the project's license field, which is supported by PyPI (what I care about).

There's no rush here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants