-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Switch license metadata to the PEP 639 format #13335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
While I'm generally in favour dogfooding the latest standards, I'm wary of making our setuptools requirement so tight. Setuptools 77.0.1 was only released ~a month ago. This may cause some issues for our redistributors who want to build pip from source. Perhaps we can merge this for the pip 25.2 release in July? |
Sure, I don't mind. Just ping me if you need this rebased. |
Agreed. Let's leave this to 25.2, at least. Also, I'm not in favour reporting pip's license as that complex expression. I don't know what legal implications there may be, but I consider pip's license to be MIT, and that's what we should report it as. The licenses of our vendored dependencies (in my view, at least) are just that - licenses of dependencies. Vendoring is simply a distribution method, and shouldn't affect our license. I'm happy for someone with more legal expertise than me to explain the implications, but as a maintainer, what I want people to see when they look at (for example) PyPI, is that we license pip under the MIT license. |
I don't really have a horse in this race, so I can set the license expression to whatever the maintainer consensus is. However, I do believe it would be correct to include the vendored licenses. In addition, PEP 639 explicitly calls out vendored dependencies as a case where multiple licenses need to be specified:
(although sadly it does not offer specific advice to such projects) |
pyproject.toml
Outdated
# Apache-2.0 OR BSD-2-Clause: packaging | ||
# Apache-2.0: cachecontrol, distro, msgpack, requests | ||
# BSD-2-Clause: pygments | ||
# BSD-3-Clause: idna | ||
# ISC: resolvelib | ||
# MIT: dependency-groups, pip, platformdirs, pyproject-hooks, rich, setuptools, | ||
# urllib3, tomli, truststore | ||
# MPL-2.0: certifi | ||
# PSF-2.0: distlib, typing-extensions | ||
license = "Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND ISC AND MIT AND MPL-2.0 AND PSF-2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocking concern but I think we should have some automated validation of this, because I believe that someone will forget to update this when we update vendored libraries if we don't validate this somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be too hard to, say:
- Put a text file next to
vendor.txt
that maps each library to its license expression. - Add a nox session that verifies that
- each library has a corresponding expression, and
- the license expression in
pyproject.toml
is a combination of MIT and all the expressions of the vendored libraries.
However, I'm not doing that unless the team agrees that the complex license expression should be used to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to either (a) inline that metadata as comments on vendor.txt or (b) have that metadata be inferred from the license metadata of the packages themselves.
Conversely, I don't see why I stand by my preference - I'd like |
The difference is that we're redistributing the code -- requests is not redistributing certifi by vendoring it. We effectively have "different source files each governed by different licenses" within our distribution files (using language from the SPDX license expression docs, where they describe the rationale for having an We do this implicitly by including license files for each vendored package and this PR makes that explicit. |
AFAIK, no Linux distribution is devendoring pip other than Gentoo. |
Let me ask the question another way. Where is the license of pip, specifically, supposed to be recorded? We have no intention of changing pip’s license from MIT - so how do we record that information in the metadata? |
It's the top-level LICENSE file, so implied via that. Otherwise, we can also do something like: # Licenses for vendored code are listed in parenthesis
license = "MIT AND (Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND ISC AND MIT AND MPL-2.0 AND PSF-2.0)" AFAIK, it's valid to have
We don't record this today, and I don't see why we need to be trying to record this. I'll flip the question: Why do you want this, and how does the current license scheme cover that need? |
|
FWIW, if there's a desire to unambiguously label the license of specifically pip's own files, then there is a solution for that: you could add an |
We do. In the classifiers. And that's what has historically been displayed on PyPI.
I want this because I want it to be clear to everyone that pip is licensed under the MIT license. Prior to PEP 639, things were clear - we documented our license in the classifiers and in the I don't see how PEP 639 changes things. We still ship our dependencies' license files (and we now record them along with our own in The idea that we put some sort of combined license expression seems to have come from nowhere - the only reference to vendoring in PEP 639 was quoted above, but out of context. It was in relation to the question "Why can't we just extend the license classifiers and use them? And in that context, "Because classifiers don't handle vendoring well" is a perfectly good answer. The scheme in PEP 639 does handle vendoring, though1 - it has Let me ask another question. If this PR had simply changed
There's no such desire - at least not from me. My desire is to retain the previous form, where we stated clearly that pip's overall license was MIT. I dislike putting license comments in every file. We'll inevitably forget one ( I'm sorry for making an issue of this. I hate licensing debates with a vengeance, and I would much rather we never had them. But more than just the debates, I hate the fact that those debates end up with people giving up on things that matter to them, just because it is too much effort to continue arguing. So with that in mind, it matters to me that we report Footnotes
|
No. In that vein, I think it's better to split this into 2 PRs: one moving us to PEP 739 style @SpecLad Would you be willing to do so?
I don't think there's ever going to be a legal requirement to change this. I do think it'll help people doing the "simple" thing of tracking licenses by looking at metadata rather than the contents of our distribution get a better picture of what the license for the code contained within it really is. |
Certainly. I will update this PR to just focus on PEP 639. |
FWIW, I would argue that it does mean that. Vendoring is just copying files from another project into your project—and once you've done it, those files are part of your project, same as any other. As such, I don't think it really makes sense to speak of MIT as the "overall" license of pip - some files in the project are under the MIT license, and some aren't.
Yes. It seems reasonable to me - removing files also frees you of license obligations imposed by those files. |
c07565b
to
15664cc
Compare
pyproject.toml
Outdated
"AUTHORS.txt", "LICENSE.txt", | ||
"src/pip/_vendor/**/*LICENSE*", "src/pip/_vendor/**/*COPYING*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm being annoying pedantic thorough, I'll also ask for these to be changed into one-per-line to align with how we're formatting Python files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I'm a pedant myself. 🙂 Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compared the sdist from main and this PR and there aren't any unexpected changes. As long as this goes in pip 25.2 and not pip 25.1, LGTM. Thanks!
news/13335.feature.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this should probably be a process
changelog entry, it's not at all a feature
🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Fixed.
Right, I have an earnest question here: Isn't the purpose of the license expression so that users can algorithmically identify what licenses a distribution requires to use? If the answer is yes, then we would be breaking that purpose by only giving the MIT license and not the other licenses. Code in pip's source distribution is not covered by the MIT license only. If the answer is no, what is the value of the license expression over the previous license field? Apologies if this is naive, I didn't follow the license expression PEP closely, and I just assumed this was the purpose, because it's a genuine issue I face that I can't currently determine the licenses in a library distribution by it's metadata, and pip would be continuing this trend if it puts MIT only in the license expression. |
Again, earnest but possibly naive analysis of the issues here:
Does this create a license expression core metadata field? Because if so the PEP's goal states:
And pip's distribution package is not covered by MIT only, so as I read it this would be an explicit violation of the PEP's main goal. |
Thanks. I'm happy with the updated PEP keeping the license as MIT. I agree with @ichard26 though, let's hold this off until 25.2.
I think pip's license has always been stated as MIT, and all existing contributions were made on that basis (I know mine have been). I don't think we can change "pip's license" (whatever that might mean) without agreement from all past contributors, and that's frankly impractical. I think there's possibly an important distinction between the project license, and the (cumulative) license of a distribution file that bundles vendored dependencies. It's not clear from PEP 639 which of those two the I understand that you think otherwise, and that's the crux of the disagreement here. IMO, what's critical is that PyPI displays "MIT" as the pip project license. That's what has always been displayed, it's what users will expect to see, and it's the only part of the licensing that's actually under the control of the pip maintainers directly1. I appreciate that tools like license scanning software need to know the combined license expression for the distribution file and I'd support adding metadata to record that. But I don't support repurposing the existing It's intensely frustrating that this wasn't picked up on when PEP 639 was being developed, but I think the correct action here is to fix PEP 639, not to mangle pip's license data. I'll post a request on Discourse for a fix in PEP 639, and I suggest we do nothing in pip (beyond switching to PEP 639 format as this PR now does) until that gets resolved. Footnotes
|
OK, I've posted on Discourse here. |
Maybe? That may be the purpose of the license expression, but it was never the purpose of the license metadata. If PEP 639 changed the intended purpose of the metadata, that was done without sufficient explanation, and without a full exploration of the implications.
Precision? Given that this is only an issue for the vanishingly rare number of projects which bundle dependencies, I think it's hard to draw meaningful conclusions here. Either answer (yes or no) misrepresents the answer to some question. IMO, retain the existing meaning for the existing field, and create a new field for "vendored dependency license expression(s)" is probably the only correct answer.
Not a problem - the license metadata debate rambled over so many years (literally!) I don't think anyone really followed the discussion more than superficially. I appreciate your problem, but it feels like this is something that PEP 639 failed to cover, rather than something where the meaning of the
But conversely, "limited to covering new mechanisms for documenting the license" reads very clearly to me as "changes how you write the data, but not what data you write". So as I read it, arguing that the license value should change because of PEP 639 is an explicit violation of the intent of the PEP. |
An excellent suggestion made in the Discourse thread is to publish our vendored dependencies' license information via a SBOM, which could be automatically generated via the |
Also, include all license files for the vendored dependencies inside the wheel, and in the `License-File` package metadata field. License files are included in distributions automatically, so remove them from `MANIFEST.in`.
I think the SBOM / PEP 770 solves the practical problem of providing clarity on the licenses of individual files within the pip source distribution. But I don't think it solves whether the intent of PEP 639 is for the projects license(s) or the distribution license(s). Until the PEP authors have made that unambiguous I would be against switching to using the PEP 639 License-Expression core metadata field, regardless of whether pip provides the projects license or the distribution licenses downstream consumers may assume it means the other. |
OK, let's wait until:
There's no rush here. |
Also, include all license files for the vendored dependencies inside the wheel, and in the
License-File
package metadata field.License files are included in distributions automatically, so remove them from
MANIFEST.in
.