As requested by @milde in https://sourceforge.net/p/docutils/bugs/441/#7043/cdb8/8742/6e7f I'm opening this issue to allow for discussion on Docutils' public API, versioning policy, and deprecation.
Enhancement proposal 10 summarizes the discussion. It will be updated with new insights and decisions until a consensus is found.
This also relates to FR 87 on type annotations.
From Günter,
The idea is to reconcile the reality (Docutils is used as if it were mature) and the version number (<1) once the API sufficiently defined a deprecation policy is agreed.
The only text I can find on library versioning is at https://docutils.sourceforge.io/docs/dev/policies.html#version-identification, excerpt below:
Major releases (x.0, e.g. 1.0) will be rare, and will
represent major changes in API, functionality, or commitment. The
major number will be bumped to 1 when the project is
feature-complete, and may be incremented later if there is a major
change in the design or API. When Docutils reaches version 1.0,
the major APIs will be considered frozen.
For details, see thebackwards compatibility policy
_.Releases that change the minor number (x.y, e.g. 0.5) will be
feature releases; new features from theDocutils core
_ will
be included.Releases that change the micro number (x.y.z, e.g. 0.4.1) will be
bug-fix releases. No new features will be introduced in these
releases; only bug fixes will be included.
The proposed backwards compatability policy reads:
Docutils' backwards compatibility policy follows the rules for Python in
PEP 387. ... The scope of the public API is laid out at the start of the backwards
compatibility rules
I propose two modifications to the policies, which will make future changes to Docutils easier to review and reason about, for project members as well as outside contributors.
Firstly, I propose adopting a formal versioning "system" such as Sematic Versioning (SemVer) or Calendar Versioning (CalVer).
Semantic versioning is nearly identical to the current versioning policy, and has the benefit of being a known quantity, reducing misunderstandings. A potential phrasing would be:
"Docutils follows SemVer. All changes must also follow the backwards compatability policy."
What this does change is that the API is never considered complete, as in the original phrasing. Docutils isn't slated for inclusion in the standard library any more, and there will always be potential improvements and changes -- new parsers, new node types, changes in the HTML specification, et cetera.
The 1.0 release then does not need to be a "big deal", and future breaking changes can go to 2.0, 3.0, etc -- setuptools is now on 60.5.0
!
Semantic versioning does have some drawbacks (https://snarky.ca/why-i-dont-like-semver/), so projects like pip have adopted calendar based versioning -- the major version is the year (22), and the minor version is either the current month or an increasing number. This relies on strong documentation and changelogs, so that when users upgrade it is obvious what changed (the idea being that every change breaks someone, so there is no contract that version numbers within a certain range mean no breakage). Luckily, Docutils has a good culture around changelogs and histories etc, so this would be easy to adopt.
I also suggest enumerating all modules, classes, and functions that form the public API. The current backwards compatability policy references PEP 387, which is specifically for the Python project itself. Checking through the documentation on every change to identify if a name is public or private is error prone (we are all human, and can miss things with no malintent) and time consuming.
At the very least I would suggest, docutils.nodes
, the reader/writer/parser aliases, docutils.core
, and the front end tools. (I'm happy to write out the full list if you give me modules/etc that should be part of the public API). It would also be useful to identify what parts of Docutils large downstream consumers use, and either create higher level abstractions or mark those as public API. (I would suggest Sphinx and MyST-parser).
It is also the established practice to mark names as private by prefixing them with an underscore. I suggest adopting this, as it gives strong guidance to downstream library authors what Docutils considers private and public. Making a private name public is far easier than going through a deprecation cycle for the reverse.
I'd suggest adding the ability to have exceptions to the policy, with removal after one minor version. My full suggested text is:
"Removal or significant alteration of any members of the public API will only take place after the behaviour has been marked as deprecated for two minor releases. Certain changes may have a shorter deprecation period of one minor release. This requires at least two project members to be in favour of doing so, and no members against.
Changes that may affect end-users (e.g. by requiring changes to the configuration file or potentially breaking custom style sheets) should be announced with a FutureWarning."
This post is intentionally opinionated so as to provide something to talk over / edit -- I'm not massively attached to any one thing!
A
for reference, amongst others, I took inspiration from:
https://setuptools.pypa.io/en/latest/development/releases.html
https://pip.pypa.io/en/latest/development/release-process/
https://numpy.org/neps/nep-0023-backwards-compatibility.html
I am in favor of semver, i think there are too much users relying on certain things
docutils is a user application not a programmer tool. means
might be API
I think with the advent of Sphinx this is no longer the case de facto. I would outline three main use-cases for Docutils:
A cursory search for usage of docutils as a library (
(import|from) docutils
) finds some 2,600 results (https://grep.app/search?q=%28import%7Cfrom%29%20docutils®exp=true&filter[lang][0]=Python).It would be an option to entirely deprecate all uses of
docutils
as a library (this is what pip did for pip 10), but I would strongly advise against this -- I think it is far better to have a clear idea of what we view the public API to be (both in terms of use as an application and as a library), and go from there. This might involve creating high level functions that mean downstream users don't have to reach into the internals of Docutils as often -- I have had to do this myself, and would prefer not to (part of the optparse -> argparse work I'm doing introduces some such abstractions).Perhaps @chrisjsewell or @tk0miya might have views from MyST and Sphinx?
A
Docutils was designed from the start to be both.
Cf. PEP 258
and https://docutils.sourceforge.io/docs/index.html#docutils-stakeholders
For backwards compatibility, we also need to think about indirect use:
similar
An important part of Docutils are also the specifications of the document
model (Docutils Document Tree) and reStructured Text!
On 2022-01-16, Adam Turner wrote:
Yes, this is the Docutils version policy (the implementation details
could eventually move to an API description).
...
Docutils follows PEP 440.
The difference to Semantic Versioning is that
in Docutils, the "major" part of the version identifier is incremented
"if there is a major change in the design or API" while
in Semantic Versioning the "major" part "MUST be incremented if any backwards incompatible changes are introduced to the public API".
With your suggestion
"adding the ability to have exceptions to the policy, with removal after one minor version"
we are back to the current policy (which I would not dare to label Semantic Versioning).
...
In my understanding of the Docutils version policy, feature-complete means
that all essential parts of the system are functional, stable, and documented.
Docutils reached this state long ago, except for the documentation and API
specification.
Releasing 1.0 will not be a breaking change, rather a change in commitment.
Problems with an external list are the need to keep it in sync with the code base and discoverability.
This speaks in favour of defining the API in the code itself or in the docstrings.
It should be possible to do a valid guess whether an object is public
by looking at the source code or using
pydoc
.The API specification document would then list the criteria for determining whether an object is part of the public API and give examples.
The abstract base classes for reader, writer, parser, and transforms come to mind.
Also
docutils.__init__
and the "plug-in API" for the components.The modules and classes intended as part of the public API come with
comprehensive docstrings which are a good guide to differentiate core objects from
auxiliary, internal objects and ambiguous (non-core) cases
(that may be useful for clients or may be used by existing applications).
The api/ section lists the existing API Reference Material.
For the Docutils project version number, the document model (docutils.dtd, doctree.txt)
and the rST specification are equally important (unless they get their own version identifier).
Yes, the final document should be a collaborative effort and we may
improve/clean the code base in the process.
OTOH, Sphinx is so closely intertwined to Docutils that it, IMO,
deserves a special handling ensuring synchronised changes.
This may be extended to other projects that use non-core objects
and are ready to follow the development, test, and report back.
Docutils' Python Coding Conventions build on PEP 8 but the leading underscores are not included in the summary nor used in praxi.
Starting to use leading underscores would give the
impression that all other objects are public.
We would need to change a lot of names in one go and this would
break all applications that use/customize non-core objects
(were we would change the name only as a precaution).
A massive name change would also complicate forensics with
git blame
.This is why I propose to follow PEP 8 section Public and Internal Interfaces with one exemption:
the rule "Even with all set appropriately, internal interfaces (...) should still be prefixed with a single leading underscore." does not apply.
("Do not break backwards compatibility just to comply with this PEP!" PEP 8 section 2.)
This leaves two ways to mark the public API in the code:
docstrings (for variabes and class attributes, use the convention "docstring below definition")
and
__all__
.In addition, we may use type annotations as an additional indicator (cf. [feature-request:#87]).
Ambiguous cases are either kept undocumented or marked as "provisional".
This keeps them backwards compatible but open for easy upgrading to public state.
Last edit: Günter Milde 2022-01-18
PEP 440 is more about how version numbers are parsed, it is not a versioning policy.
I agree with your summary. The issue is that I think users, downstream developers will expect the latter -- having backwards incompatible changes in a minor version violates the principle of least surprise.
I'd argue more a merger of the two than just back to the current policy. Pure semantic versioning allows you to make a backwards incompatible change with no warning or deprecation period, which is generally not how Python projects operate. What I proposed is the SemVer specification in terms of breaking changes (removals, changes in the DTD/default templates/etc), but codifying a deprecation period on top of that.
I'd agree with you here - what I took issue to was the word "frozen" -- that e.g. wouldn't let us implement nested markup parsing after version 1.0.
I agree with you. The problem is how best to do this. I would argue that type annotations and docstrings are orthoganal to the public API question -- for developers working on Docutils itself, both are helpful! (I would miss doctrings less than type hints, personally).
I see your point on churn and names re undercores, but it is possible via module level
__getattr__
or similar to do a mass deprecation of names. On introducing underscores in some places might mean people assume everything else is public -- I would argue though that the people who are likely to do that, do it already -- as nothing has underscores.This is one of the reasons I proposed the helpers in the
argparse
changeset by the way -- currently downstream developers have to assume a lot of knowledge of Docutils' internals -- I'd like to introduce higher level abstractions to make e.g. getting default settings or parsing a list of lines of reST easier -- ideally one function call.An alternative idea is an
@public
or@private
no-op decorator that we could use to signal API status. This could then be used to generate the API list, at least in terms of code.Clearly Docutils still uses
svn
, but if this is a concern then.git-blame-ignore-revs
can be used (brief article)Concrete proposal RE public API in code:
__all__
as you proposed for all global names (classes, module level functions, module level variables, etc)__getattr__
or similar (with full backwards compatability and a deprecation period, and ideally helper functions to mean that downstream users don't need to use internal things as much)I strongly think there is a benefit to moving more in line with the Python ecosystem as a whole in terms of how we name things and define the public API, but if this is entirely off the table, I think the proposals we've outlined would still be a net improvement.
A
Docstrings are (at least indirectly) related to public API in PEP 8:
I'd rather not.
I don't insist on type annotations as indicator.
However, adding type annotations to existing code should concentrate on the
public API.
Agreed.
I'll attach an draft document that tries to sum up what we have reached so
far (work in progress).
Fair enough.
This is good to hear, and I agree that to begin with we should concentrate on the public API.
Look forwards to it!
A
Draft for a "Public API and Backwards Compatibility Policy" document.
Sorry for the (long) delay. I've posted my redrafted suggestion at https://github.com/AA-Turner/docutils/pull/14
In my personal opinion (informed by (limited) experience as a PEP editor, but not with that hat on) I don't think we need the formality of "Docutils Enhancement Proposals". That is useful for a project with a large and diffuse team where a single design document can be used as a rallyign point and reference. Docutils currently has 3 active contributors to the code from what I can tell (you, me, and Engelbert), only two of whom are "project developers".
I think the level of discussion we have on issues is reasonable to reach a good conclusion, and the work needed to write a "DEP" might just be better spent on the issue tracker.
A
Thanks for your proposal. Some points:
I don't think we need to include the overview of docutils use(r)s in the
"Policies" document.
Do you mean the provided output templates [#]?
How about provided style sheets [#] (unless marked as "provisional")?
.. [#] template.txt, default.tex, titlepage.tex, xelatex.tex
.. [#] html4css1.css, minimal.css, docutils.sty (LaTeX), styles.odt
Why
__public__
and not__all__
?Are there other packages using this variable name for this purpose?
Do you want this
interdicting all incompatible changes in "minor" releases), or
releases, similar to :PEP:
387
)?I don't think we need an additional restriction if we choose SemVer:
downstream users will be prepared for incompatible changes in a new
"major" version.
...
The issue tracker is good for ongoing discussion, but not as reference
for an informed final decision on the request/proposal,
for the rationale and context in the "Policies" section on the public API,
for the rationale and context when the policy/code will need adaption in,
say, 10 years time. (I'll prefer to read a condensed/edited document
instead of all of the comments in the issue tracker by then.)
Also, moving reStructuredText documents to a new home/host is far easier
than "saving" tracker issues.
Removed
Yes, I updated to include the templates and stylesheets
__all__
has special meaning to the interpreter forfrom module import *
. Perhaps it doesn't matter, but I didn't want to conflate the two usages.I'd prefer to adopt SemVer, but I dropped that from the proposal for later conversation. Would you be happy with adopting SemVer and updating the language in this section to "should" rather than "must"?
Fair enough, perhaps "DEP" is a reasonable way forwards.
A
On 2022-02-23, Adam Turner wrote:
IMO, the special meaning of
__all__
overlaps with "is a public object".However, I am not sure whether defining and maintaining
__all__
- lists for allmodules and classes in the docutils package is worth the effort.
(It could help to suppress non-public objects from the "help" output and auto-generated API docs without the need of re-naming them with leading underscore.)
The advice: "If you want to know whether an object is public, check the
docstring" is simple, easy to follow and makes it easy to implement.
...
Semantic Versioning seems to emerge as the "consensus of least surprise".
I am fine with this. I prefer to keep it without additional constraints.
I updated the proposal in the sandbox accordingly.
Semantic Versioning would imply, that removals of deprecated attributes and objects cannot be done in minor versions after 1.0.
This means we need to adapt announcements like "will be removed in 1.2".
Last edit: Günter Milde 2022-04-06
my2c
Diff: