Menu

#89 Public API, versioning, and deprecation

Default
open
nobody
None
5
2025-04-29
2022-01-16
Adam Turner
No

As requested by @milde in https://sourceforge.net/p/docutils/bugs/441/#7043/cdb8/8742/6e7f I'm opening this issue to allow for discussion on Docutils' public API, versioning policy, and deprecation.

Enhancement proposal 10 summarizes the discussion. It will be updated with new insights and decisions until a consensus is found.

This also relates to FR 87 on type annotations.

From Günter,

The idea is to reconcile the reality (Docutils is used as if it were mature) and the version number (<1) once the API sufficiently defined a deprecation policy is agreed.

The only text I can find on library versioning is at https://docutils.sourceforge.io/docs/dev/policies.html#version-identification, excerpt below:

Major releases (x.0, e.g. 1.0) will be rare, and will
represent major changes in API, functionality, or commitment. The
major number will be bumped to 1 when the project is
feature-complete, and may be incremented later if there is a major
change in the design or API. When Docutils reaches version 1.0,
the major APIs will be considered frozen.
For details, see the backwards compatibility policy_.

Releases that change the minor number (x.y, e.g. 0.5) will be
feature releases; new features from the Docutils core_ will
be included.

Releases that change the micro number (x.y.z, e.g. 0.4.1) will be
bug-fix releases. No new features will be introduced in these
releases; only bug fixes will be included.

The proposed backwards compatability policy reads:

Docutils' backwards compatibility policy follows the rules for Python in
PEP 387. ... The scope of the public API is laid out at the start of the backwards
compatibility rules

I propose two modifications to the policies, which will make future changes to Docutils easier to review and reason about, for project members as well as outside contributors.

Firstly, I propose adopting a formal versioning "system" such as Sematic Versioning (SemVer) or Calendar Versioning (CalVer).

Semantic versioning is nearly identical to the current versioning policy, and has the benefit of being a known quantity, reducing misunderstandings. A potential phrasing would be:

"Docutils follows SemVer. All changes must also follow the backwards compatability policy."

What this does change is that the API is never considered complete, as in the original phrasing. Docutils isn't slated for inclusion in the standard library any more, and there will always be potential improvements and changes -- new parsers, new node types, changes in the HTML specification, et cetera.

The 1.0 release then does not need to be a "big deal", and future breaking changes can go to 2.0, 3.0, etc -- setuptools is now on 60.5.0!

Semantic versioning does have some drawbacks (https://snarky.ca/why-i-dont-like-semver/), so projects like pip have adopted calendar based versioning -- the major version is the year (22), and the minor version is either the current month or an increasing number. This relies on strong documentation and changelogs, so that when users upgrade it is obvious what changed (the idea being that every change breaks someone, so there is no contract that version numbers within a certain range mean no breakage). Luckily, Docutils has a good culture around changelogs and histories etc, so this would be easy to adopt.


I also suggest enumerating all modules, classes, and functions that form the public API. The current backwards compatability policy references PEP 387, which is specifically for the Python project itself. Checking through the documentation on every change to identify if a name is public or private is error prone (we are all human, and can miss things with no malintent) and time consuming.

At the very least I would suggest, docutils.nodes, the reader/writer/parser aliases, docutils.core, and the front end tools. (I'm happy to write out the full list if you give me modules/etc that should be part of the public API). It would also be useful to identify what parts of Docutils large downstream consumers use, and either create higher level abstractions or mark those as public API. (I would suggest Sphinx and MyST-parser).

It is also the established practice to mark names as private by prefixing them with an underscore. I suggest adopting this, as it gives strong guidance to downstream library authors what Docutils considers private and public. Making a private name public is far easier than going through a deprecation cycle for the reverse.

I'd suggest adding the ability to have exceptions to the policy, with removal after one minor version. My full suggested text is:

"Removal or significant alteration of any members of the public API will only take place after the behaviour has been marked as deprecated for two minor releases. Certain changes may have a shorter deprecation period of one minor release. This requires at least two project members to be in favour of doing so, and no members against.

Changes that may affect end-users (e.g. by requiring changes to the configuration file or potentially breaking custom style sheets) should be announced with a FutureWarning."

This post is intentionally opinionated so as to provide something to talk over / edit -- I'm not massively attached to any one thing!

A


for reference, amongst others, I took inspiration from:
https://setuptools.pypa.io/en/latest/development/releases.html
https://pip.pypa.io/en/latest/development/release-process/
https://numpy.org/neps/nep-0023-backwards-compatibility.html

Discussion

  • engelbert gruber

    I am in favor of semver, i think there are too much users relying on certain things

    docutils is a user application not a programmer tool. means

    • any script.py
    • any css/dom-construct
    • any tex-macro

    might be API

     
  • Adam  Turner

    Adam Turner - 2022-01-17

    docutils is a user application not a programmer tool

    I think with the advent of Sphinx this is no longer the case de facto. I would outline three main use-cases for Docutils:

    1. as an application, through one of the front-end tools
    2. in direct dependencies, such as Sphinx or MyST-parser
    3. when building plugins for tools in (2)

    A cursory search for usage of docutils as a library ((import|from) docutils) finds some 2,600 results (https://grep.app/search?q=%28import%7Cfrom%29%20docutils&regexp=true&filter[lang][0]=Python).

    It would be an option to entirely deprecate all uses of docutils as a library (this is what pip did for pip 10), but I would strongly advise against this -- I think it is far better to have a clear idea of what we view the public API to be (both in terms of use as an application and as a library), and go from there. This might involve creating high level functions that mean downstream users don't have to reach into the internals of Docutils as often -- I have had to do this myself, and would prefer not to (part of the optparse -> argparse work I'm doing introduces some such abstractions).

    Perhaps @chrisjsewell or @tk0miya might have views from MyST and Sphinx?

    A

     
    • Günter Milde

      Günter Milde - 2022-01-17

      docutils is a user application not a programmer tool

      Docutils was designed from the start to be both.
      Cf. PEP 258
      and https://docutils.sourceforge.io/docs/index.html#docutils-stakeholders

      ... I would outline three main use-cases for Docutils:

      1. as an application, through one of the front-end tools
      2. in direct dependencies, such as Sphinx or MyST-parser
      3. when building plugins for tools in (2)
      1. when building plugins for Docutils (e.g. myst-docutils and pycmark)

      For backwards compatibility, we also need to think about indirect use:

      1. as author of documents that will be processed by Docutils, Sphinx or
        similar
      2. as maintainer of a project that uses Doctuils in its tool chain.

      An important part of Docutils are also the specifications of the document
      model (Docutils Document Tree) and reStructured Text!

       
  • Günter Milde

    Günter Milde - 2022-01-18

    On 2022-01-16, Adam Turner wrote:

    The only text I can find on library versioning is at
    https://docutils.sourceforge.io/docs/dev/policies.html#version-identification,

    Yes, this is the Docutils version policy (the implementation details
    could eventually move to an API description).
    ...

    Firstly, I propose adopting a formal versioning "system" such as Sematic Versioning (SemVer) or Calendar Versioning (CalVer).
    ...
    Semantic versioning is nearly identical to the current versioning policy

    Docutils follows PEP 440.
    The difference to Semantic Versioning is that
    in Docutils, the "major" part of the version identifier is incremented
    "if there is a major change in the design or API" while
    in Semantic Versioning the "major" part "MUST be incremented if any backwards incompatible changes are introduced to the public API".

    With your suggestion
    "adding the ability to have exceptions to the policy, with removal after one minor version"
    we are back to the current policy (which I would not dare to label Semantic Versioning).
    ...

    What this does change is that the API is never considered complete, as
    in the original phrasing.

    In my understanding of the Docutils version policy, feature-complete means
    that all essential parts of the system are functional, stable, and documented.
    Docutils reached this state long ago, except for the documentation and API
    specification.

    The 1.0 release then does not need to be a "big deal",

    Releasing 1.0 will not be a breaking change, rather a change in commitment.


    I also suggest enumerating all modules, classes, and functions that
    form the public API.

    Problems with an external list are the need to keep it in sync with the code base and discoverability.
    This speaks in favour of defining the API in the code itself or in the docstrings.
    It should be possible to do a valid guess whether an object is public
    by looking at the source code or using pydoc.
    The API specification document would then list the criteria for determining whether an object is part of the public API and give examples.

    At the very least I would suggest, docutils.nodes, the
    reader/writer/parser aliases, docutils.core, and the front end tools.

    The abstract base classes for reader, writer, parser, and transforms come to mind.
    Also docutils.__init__ and the "plug-in API" for the components.

    The modules and classes intended as part of the public API come with
    comprehensive docstrings which are a good guide to differentiate core objects from
    auxiliary, internal objects and ambiguous (non-core) cases
    (that may be useful for clients or may be used by existing applications).

    The api/ section lists the existing API Reference Material.

    For the Docutils project version number, the document model (docutils.dtd, doctree.txt)
    and the rST specification are equally important (unless they get their own version identifier).

    It would also be useful to identify what parts of Docutils large
    downstream consumers use, and either create higher level abstractions
    or mark those as public API. (I would suggest Sphinx and MyST-parser).

    Yes, the final document should be a collaborative effort and we may
    improve/clean the code base in the process.
    OTOH, Sphinx is so closely intertwined to Docutils that it, IMO,
    deserves a special handling ensuring synchronised changes.
    This may be extended to other projects that use non-core objects
    and are ready to follow the development, test, and report back.


    It is also the established practice
    to mark names as private by prefixing them with an underscore.

    I suggest adopting this, as it gives strong guidance to downstream
    library authors what Docutils considers private and public.

    Docutils' Python Coding Conventions build on PEP 8 but the leading underscores are not included in the summary nor used in praxi.
    Starting to use leading underscores would give the
    impression that all other objects are public.
    We would need to change a lot of names in one go and this would
    break all applications that use/customize non-core objects
    (were we would change the name only as a precaution).
    A massive name change would also complicate forensics with git blame.

    Making a private name public is far easier than going through a
    deprecation cycle for the reverse.

    This is why I propose to follow PEP 8 section Public and Internal Interfaces with one exemption:
    the rule "Even with all set appropriately, internal interfaces (...) should still be prefixed with a single leading underscore." does not apply.
    ("Do not break backwards compatibility just to comply with this PEP!" PEP 8 section 2.)

    This leaves two ways to mark the public API in the code:
    docstrings (for variabes and class attributes, use the convention "docstring below definition")
    and __all__.
    In addition, we may use type annotations as an additional indicator (cf. [feature-request:#87]).

    • Objects with comment instead of a docstring are not part of the core API.
    • Objects where the docstring says "internal" are not part of the core API.
    • Objects where the docstring says "provisional" are exempt from the backwards compatibility promise.

    Ambiguous cases are either kept undocumented or marked as "provisional".
    This keeps them backwards compatible but open for easy upgrading to public state.

     

    Last edit: Günter Milde 2022-01-18
    • Adam  Turner

      Adam Turner - 2022-01-20

      Docutils follows PEP 440

      PEP 440 is more about how version numbers are parsed, it is not a versioning policy.

      in Docutils, the "major" part of the version identifier is incremented "if there is a major change in the design or API" while in Semantic Versioning the "major" part "MUST be incremented if any backwards incompatible changes are introduced to the public API".

      I agree with your summary. The issue is that I think users, downstream developers will expect the latter -- having backwards incompatible changes in a minor version violates the principle of least surprise.

      With your suggestion "adding the ability to have exceptions to the policy, with removal after one minor version" we are back to the current policy

      I'd argue more a merger of the two than just back to the current policy. Pure semantic versioning allows you to make a backwards incompatible change with no warning or deprecation period, which is generally not how Python projects operate. What I proposed is the SemVer specification in terms of breaking changes (removals, changes in the DTD/default templates/etc), but codifying a deprecation period on top of that.

      In my understanding of the Docutils version policy, feature-complete means that all essential parts of the system are functional, stable, and documented.

      I'd agree with you here - what I took issue to was the word "frozen" -- that e.g. wouldn't let us implement nested markup parsing after version 1.0.


      ... keep it in sync with the code base and discoverability. This speaks in favour of defining the API in the code itself or in the docstrings

      I agree with you. The problem is how best to do this. I would argue that type annotations and docstrings are orthoganal to the public API question -- for developers working on Docutils itself, both are helpful! (I would miss doctrings less than type hints, personally).

      I see your point on churn and names re undercores, but it is possible via module level __getattr__ or similar to do a mass deprecation of names. On introducing underscores in some places might mean people assume everything else is public -- I would argue though that the people who are likely to do that, do it already -- as nothing has underscores.
      This is one of the reasons I proposed the helpers in the argparse changeset by the way -- currently downstream developers have to assume a lot of knowledge of Docutils' internals -- I'd like to introduce higher level abstractions to make e.g. getting default settings or parsing a list of lines of reST easier -- ideally one function call.

      An alternative idea is an @public or @private no-op decorator that we could use to signal API status. This could then be used to generate the API list, at least in terms of code.

      def public(f):
          return f
      

      A massive name change would also complicate forensics with git blame.

      Clearly Docutils still uses svn, but if this is a concern then .git-blame-ignore-revs can be used (brief article)


      Concrete proposal RE public API in code:

      • Use __all__ as you proposed for all global names (classes, module level functions, module level variables, etc)
      • Adopt underscores for new private names
      • Consider introducing underscores for existing names, through __getattr__ or similar (with full backwards compatability and a deprecation period, and ideally helper functions to mean that downstream users don't need to use internal things as much)
      • Don't use type annotations as an indication of status in the public API -- they are too helpful for that. Move to use them everywhere.
      • Formalise the wording for docstrings for public vs private vs provisional (ideally this would be a single regex pattern).

      I strongly think there is a benefit to moving more in line with the Python ecosystem as a whole in terms of how we name things and define the public API, but if this is entirely off the table, I think the proposals we've outlined would still be a net improvement.

      A

       
      • Günter Milde

        Günter Milde - 2022-01-21

        in Docutils, the "major" part of the version identifier is incremented
        "if there is a major change in the design or API" while in Semantic
        Versioning the "major" part "MUST be incremented if any backwards
        incompatible changes are introduced to the public API".

        I agree with your summary. The issue is that I think users, downstream
        developers will expect the latter -- having backwards incompatible
        changes in a minor version violates the principle of least surprise.

        What I proposed is the
        SemVer specification in terms of breaking changes (removals, changes in
        the DTD/default templates/etc), but codifying a deprecation period on
        top of that.


        I would argue that type annotations and docstrings are orthogonal to
        the public API question -- for developers working on Docutils itself,
        both are helpful!

        Docstrings are (at least indirectly) related to public API in PEP 8:

        Documented interfaces are considered public, unless the documentation
        explicitly declares them to be provisional or internal interfaces
        exempt from the usual backwards compatibility guarantees.
        All undocumented interfaces should be assumed to be internal.


        Concrete proposal RE public API in code:

        • Use __all__ as you proposed for all global names (classes, module level functions, module level variables, etc)

        • Adopt underscores for new private names

        • Consider introducing underscores for existing names, through __getattr__ or similar (with full backwards compatability and a deprecation period, and ideally helper functions to mean that downstream users don't need to use internal things as much)

        I'd rather not.

        • Don't use type annotations as an indication of status in the public API -- they are too helpful for that. Move to use them everywhere.

        I don't insist on type annotations as indicator.
        However, adding type annotations to existing code should concentrate on the
        public API.

        • Formalise the wording for docstrings for public vs private vs provisional (ideally this would be a single regex pattern).

        Agreed.

        I'll attach an draft document that tries to sum up what we have reached so
        far (work in progress).

         
        • Adam  Turner

          Adam Turner - 2022-01-24

          Consider introducing underscores for existing names, through __getattr__ or similar (with full backwards compatability and a deprecation period, and ideally helper functions to mean that downstream users don't need to use internal things as much)

          I'd rather not.

          Fair enough.

          I don't insist on type annotations as indicator. However, adding type annotations to existing code should concentrate on the public API.

          This is good to hear, and I agree that to begin with we should concentrate on the public API.

          I'll attach an draft document that tries to sum up what we have reached so
          far (work in progress).

          Look forwards to it!

          A

           
  • Günter Milde

    Günter Milde - 2022-01-25

    Draft for a "Public API and Backwards Compatibility Policy" document.

     
    • Adam  Turner

      Adam Turner - 2022-02-16

      Sorry for the (long) delay. I've posted my redrafted suggestion at https://github.com/AA-Turner/docutils/pull/14

      In my personal opinion (informed by (limited) experience as a PEP editor, but not with that hat on) I don't think we need the formality of "Docutils Enhancement Proposals". That is useful for a project with a large and diffuse team where a single design document can be used as a rallyign point and reference. Docutils currently has 3 active contributors to the code from what I can tell (you, me, and Engelbert), only two of whom are "project developers".

      I think the level of discussion we have on issues is reasonable to reach a good conclusion, and the work needed to write a "DEP" might just be better spent on the issue tracker.

      A

       
      • Günter Milde

        Günter Milde - 2022-02-23

        I've posted my redrafted suggestion at
        https://github.com/AA-Turner/docutils/pull/14

        Thanks for your proposal. Some points:

        I don't think we need to include the overview of docutils use(r)s in the
        "Policies" document.

        Docutils public APIs are:
        ... * the Docutils writer templates

        Do you mean the provided output templates [#]?
        How about provided style sheets [#]
        (unless marked as "provisional")?

        .. [#] template.txt, default.tex, titlepage.tex, xelatex.tex
        .. [#] html4css1.css, minimal.css, docutils.sty (LaTeX), styles.odt

        • behaviour and names of all items included in the __public__
          attribute of their parent objects.

        Why __public__ and not __all__?
        Are there other packages using this variable name for this purpose?

        Deprecation periods

        Do you want this

        • in addition to SemVer_ (i.e. also for changes in "major" releases,
          interdicting all incompatible changes in "minor" releases), or
        • instead of SemVer_ (i.e. for "minor/medium" changes in "minor"
          releases, similar to :PEP:387)?

        I don't think we need an additional restriction if we choose SemVer:
        downstream users will be prepared for incompatible changes in a new
        "major" version.

        ...

        I think the level of discussion we have on issues is reasonable to
        reach a good conclusion, and the work needed to write a "DEP" might
        just be better spent on the issue tracker.

        The issue tracker is good for ongoing discussion, but not as reference

        • for an informed final decision on the request/proposal,

        • for the rationale and context in the "Policies" section on the public API,

        • for the rationale and context when the policy/code will need adaption in,
          say, 10 years time. (I'll prefer to read a condensed/edited document
          instead of all of the comments in the issue tracker by then.)

        Also, moving reStructuredText documents to a new home/host is far easier
        than "saving" tracker issues.

         
        • Adam  Turner

          Adam Turner - 2022-02-23

          I don't think we need to include the overview of docutils use(r)s in the
          "Policies" document.

          Removed

          Do you mean the provided output templates

          Yes, I updated to include the templates and stylesheets

          Why __public__ and not __all__?

          __all__ has special meaning to the interpreter for from module import *. Perhaps it doesn't matter, but I didn't want to conflate the two usages.

          Do you want this ... in addition to [or] instead of SemVer

          I'd prefer to adopt SemVer, but I dropped that from the proposal for later conversation. Would you be happy with adopting SemVer and updating the language in this section to "should" rather than "must"?


          Fair enough, perhaps "DEP" is a reasonable way forwards.

          A

           
          • Günter Milde

            Günter Milde - 2022-03-10

            On 2022-02-23, Adam Turner wrote:

            Why __public__ and not __all__?

            __all__ has special meaning to the interpreter for from module import *. Perhaps it doesn't matter, but I didn't want to conflate the
            two usages.

            IMO, the special meaning of __all__ overlaps with "is a public object".
            However, I am not sure whether defining and maintaining __all__ - lists for all
            modules and classes in the docutils package is worth the effort.
            (It could help to suppress non-public objects from the "help" output and auto-generated API docs without the need of re-naming them with leading underscore.)

            The advice: "If you want to know whether an object is public, check the
            docstring" is simple, easy to follow and makes it easy to implement.

            ...

            I'd prefer to adopt SemVer, but I dropped that from the proposal for
            later conversation. Would you be happy with adopting SemVer and
            updating the language in this section to "should" rather than "must"?

            Semantic Versioning seems to emerge as the "consensus of least surprise".
            I am fine with this. I prefer to keep it without additional constraints.
            I updated the proposal in the sandbox accordingly.

            Semantic Versioning would imply, that removals of deprecated attributes and objects cannot be done in minor versions after 1.0.
            This means we need to adapt announcements like "will be removed in 1.2".

             

            Last edit: Günter Milde 2022-04-06
  • engelbert gruber

    my2c

    • the "principle of least surprise" results in semantic versioning
    • shouldn't we mention css/html-dom and tex-macros in api
     
  • Günter Milde

    Günter Milde - 2025-04-29
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,6 @@
     As requested by @milde in https://sourceforge.net/p/docutils/bugs/441/#7043/cdb8/8742/6e7f I&#39;m opening this issue to allow for discussion on Docutils&#39; public API, versioning policy, and deprecation.
    +
    +[enhancement proposal 10](https://docutils.sourceforge.io/docs/eps/ep-010.html) summarizes the discussion. It will be updated with new insights and decisions until a consensus is found.
    
     This also relates to FR 87 on type annotations. 
    
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.