`` encloses a role. There is a default role, else :<role>:`text` _ in front, is the special target role. For one word the backtick can be dropped. _`__init__` should produce a target named "__init__".
But instead the produced target is "init".
The backtick avoids ambiguity. There is no need for this behavior.
Diff:
Please be careful with using raw markup in a web form like this. SourceForge expects MarkDown, which has enough similarities to reStructuredText that the markup will be interpreted/misinterpreted. Use MarkDown to quote any markup, and check that the result makes sense when rendered (use the preview function).
Last edit: Günter Milde 2025-04-29
When you say, "There is no need for this behavior", what behavior do you mean, exactly?
It works fine for me. This input:
Produces this output:
The target name is
__init__
. The ID drops the underscores, for the reasons explained in docutils.nodes.Element and docutils.nodes.make_id, e.g.:Last edit: Günter Milde 2025-04-29
According
https://www.w3.org/TR/CSS21/syndata.html#characters
an identifier can start with two underscores in CSS.
HTML5 allows the id value to start with two underscores (https://html.spec.whatwg.org/multipage/dom.html#the-id-attribute).
HTML5 id is specified to not contain spaces, but some browsers do support spaces nevertheless.
HTML5 does not specify why it disallows spaces. It should therefore allow spaces.
I made a related post about docutils changing IDs in 11/2018: https://sourceforge.net/p/docutils/mailman/message/36453416/
My position is this:
The id should not be changed.
Docutils should even keep spaces despite HTML5 disallowing them.
If the user runs into a problem with a browser, he will change the id himself and know about it.
Maybe he converts to just pdf, anyway.
To summarize:
RST is not html and does not need restrictions from HTML (or CSS) altogether.
Docutils should develop in that direction.
Relaxing rules does not produce backward incompatibility, either.
Ticket moved from /p/docutils/bugs/379/
In rST/Docutils, it is a bit more complicated:
Docutils doctree elements may have multiple ids and names.
In the reStructuredText source, only reference names are used for naming
elements as well as referring to them. IDs are only used in generated documents.
Reference names may be auto-derived from the content (e.g. section
titles) or specified by the author via rST syntax (:name: option of
directives, content of hyperlink targets, label of footnotes or citations).
IDs_ are generated by Docutils (sometimes using names as base) when
parsing rST or in transformations.
To achieve this, the id must be valid in all output formats supported by
Docutils (HTML4.1/XHTML1, HTML5, LaTeX, troff (manpage), XML, ODF/ODT).
HTML4.1:
IDs must begin with a letter
[A-Za-z]
and may be followed byany number of letters, digits
[0-9]
, or any of the characters-_:.
HTML5:
no whitespace
LaTeX:
only ASCII characters (32-127) except "%~#{}"
* "{" and "}" might be used if balanced but this is not recommended.
* Use of certain LaTeX packages results in more exceptions.
* Spaces are allowed.
* With XeTeX/LuaTeX, Unicode characters are allowed, too.
https://tex.stackexchange.com/questions/18311/what-are-the-valid-names-as-labels
ODT/ODF: <to be="" completed=""></to>
troff: <to be="" completed=""></to>
Internal (rST source and included files/parent documents):
use the reference name. This works independent of the output format.
External:
HTML: Use the generated id (when unsure about the transformation of a
given name to id, look it up in the output).
LaTeX: Use the id as label (e.g. in
\ref{}
). This works only if theexternal LaTeX source is combined with the Docutils-generated
LaTeX source (i.e. one must include the other or both included in a
common parent).
PDF: named destinations are currently not supported in PDFs
generated from Docutils-generated LaTeX.
https://tex.stackexchange.com/questions/213860/how-to-generate-a-named-destination-in-pdf
The id is (currently) generated once and used unchanged by the writers.
Docutils policy is to create valid output. Until this restriction is
lifted in the HTML5 standard, Docutils will not use spaces in HTML-IDs.
Spaces are allowed in reference names.
The author cannot change IDs nor implicit reference names directly. If we
would keep spaces, any document with a section title containing whitespace
would also contain spaces in the
id
of the corresponding section element.Even worse: Accented characters, Umlauts, Greek, Cyrillic, etc. in section
titles would lead to compilation errors with
pdflatex
.This is why the internal identifiers (reference names) don't
have these limitations. The rules for reference names (whitespace
normalization and downcasing) are solely based on practicability for rST.
Identifiers in the generated documents must comply with the restrictions of
the output document format.
There are two alternatives:
a) Keep ids identical across output formats. This would allow only the
intersection of valid element identifiers.
We could lift the restrictions of CSS1, as generated documents would
still be valid XHTML1 and CSS selectors may use escaping or
[argument]
syntax.This would relax the requirements to complying with the regexp
[A-Za-z][-_:.A-Za-z0-9]*
(i.e. also allow underscore, colon,and full stop).
b) Allow less restrictive identifiers in some formats:
HTML is the format most probably linked to.
The "html5" writer could use the name as ID, just replacing spaces.
This would allow external links like
http://example.com/parrot.html#1.Ιανουάριος
.Or the restriction on the first character may be dropped with an exception
for "html4css1".
No problem for internal links (unless we also change the rules for reference names.
However, external links adapted to the current rules may break.
Example: a document,
parrot.rst
contains::and I link to this section from somewhere on the net with the URL
http://example.org/parrot.html#schoner-titel-warum-nicht.
This link will be broken after re-processing the unchanged source with a Docutils
version with relaxed id-rules.
Therefore, I would only change the rules after careful consideration and an
advance warning. Possibly with an opt-in setting.
Last edit: Günter Milde 2025-04-29
I've abbreviated the general concept of identifier with ID.
In this general meaning a reference name is an ID,
because you reference something by uniquely identifying it.
If in docutils there are more
reference names
andids
then there are more ways to reference an item.
That is OK.
I was referring only to user chosen
reference name
_.Let's keep out IDs generated from headers or form
:name:
.I personally never rely on these generated IDs,
because I don't know them.
Instead I put
.. _`some_title_id`:
in front of a header.User chosen target IDs (
reference name
_ in rst) should not be changed.How are more reference names translated to html,
e.g. for the above additional
some_title_id
?More IDs would allow to keep the legacy ID and add
the unchanged user
reference name
as additional ID.Else one could add a
docutils.conf
setting to tell docutils which method to use.About multiple IDs in html:
https://stackoverflow.com/questions/192048/can-an-html-element-have-multiple-ids
See comment by BoltClock or the answer by tvanfosson.
consistent identifiers, the same rules must aply to all output formats.
Anchors with unchecked user-specified ID value could be specified using raw input but this is not recommended, though.
Try yourself:
If you export to Docutils-XML or ~pseudoxml, you will see the three names and ids of the note element. In the HTML, spans are used as anchors for the additional identifiers.
Last edit: Günter Milde 2025-04-29
Docutils has versions.
A new version is allowed to behave differently, according semantic versioning.
Everyone knows that.
If someone uses a new version of docutils,
it is that one's responsibility to integrate it into its context.
Docutils should develop with the associated standards.
HTML has standard 5 now.
IDs should be modified only according standard 5.
This means that only spaces can be replaced
when deriving HTML IDs.
There is one problem, though: "Cool URIs don't change"
(https://www.w3.org/Provider/Style/URI.html).
When a new Docutils version produces different URIs for the same input, we
should offer users a way to keep the old URIs.
HTML comes in many different versions. Docutils supports HTML5 with the
"html5_polyglot" writer and XHTML1.1/transitional with the default writer
"html4css1". The default may change in future.
Identifier keys must be valid in all supported output formats.
Therefore, they must comply with restrictions in the
respective output formats (HTML4.1, HTML5, polyglot HTML,
LaTeX, ODT, troff (manpage), XML).
__ http://www.w3.org/TR/html401/types.html#type-name
__ https://www.w3.org/TR/html50/dom.html#the-id-attribute
__ https://www.w3.org/TR/html-polyglot/#id-attribute
__ https://tex.stackexchange.com/questions/18311/what-are-the-valid-names-as-labels
__ https://help.libreoffice.org/6.3/en-US/text/swriter/01/04040000.html?DbPAR=WRITER#bm_id4974211
__ https://www.w3.org/TR/REC-xml/#id
We may want to keep the "one ID format for all output formats". Then only
the underscore (
_
) may be allowed in addition to the currenttransformation.
+1 one rule is easier to remember than a set of different rules.
-1 IDs must keep to a restrictive rule even in more relaxed output formats.
Alternatively, we may allow different identifier transformations for each
output format:
+1 ID-transformation follows (almost) the relaxed rules of the output format.
-1 More complex setup.
-1 ID value used in the output is even harder to predict.
A possible implementation would be via a new "identifier_restrictions"
configuration setting that takes a list of rule sets (CSS1, HTML4, HTML5,
XML, LaTeX, XeTeX/LuaTeX, ODT, troff) and combines them to form the required
transition.
Examples:
The current transition would be
identifier_restrictions: HTML4,CSS1
.The "html5polyglot" section could use
identifier_restrictions: XML
, aspolyglot HTML requires valid XML identifiers.
A user may override this in a config file or with
rst2html5 --identifier-restrictions=HTML5
.Last edit: Günter Milde 2025-04-29
This is a nice solution. I would also have a special
--identifier-restrictions=none
to turn of all ID mappings.I attach an experimental implementation draft and tests for exploration.
I like this:
It allows to use the same ID for output formats that support it,
which are a lot considering HTML5, ODT, XeTeX and XML.
It also means that the generated documents of these formats all have the same ID for the same content,
including the RST source
It stores the ID language restrictions of different target formats within docutils
Regarding API, I would make your
trim_name()
the newmake_id()
:The
has_prefix
shouldn't be needed because determined by the ID format dataid_start
andid_char
.In the command line interface I would also default to
legacy
,because of "Cool URIs don't change" and to avoid the necessity to change people's scripts.
I did not compare the ID language data in your py file with the documentation of the according formats.
__init__
: becomes <p id="init"> instead of <p id="init"> --> allow more characters when transforming "names" to "ids".See also bug #207 (closed as a duplicate of this request).
On 2024-03-25, a user posted a request to docutils-develop with a use case where the identifier-normalization of class names stands in the way:
One may also consider uncoupling the handling of class names and
identifiers, as restrictions in the output formats may differ and
class names are more directly user-set, so there may be less side-effects.
Last edit: Günter Milde 2024-03-28