Skip to content

Escaping double hyphen within :ref: role doesn't work and a dash is rendered. #11492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bgusach opened this issue Jul 19, 2023 · 2 comments
Open

Comments

@bgusach
Copy link

bgusach commented Jul 19, 2023

Describe the bug

The role

:ref:`\--interface <some-ref>`

should be rendered as --interface (with two hyphens) but it is rendered as –interface (with one en-dash) instead.

Outside a :ref: the escaping works as expected.

These following funky alternatives

:ref:`\\--interface <some-ref>` 
:ref:`\\-\\-interface <some-ref>` 

don't work either.

The only workaround I've found has been to define smartquotes = False in conf.py.

(Since other roles like :code: handle -- in a different way, I'm inclined to think that this is a Sphinx and not a Docutils problem, but I may very well be wrong.)

How to Reproduce

  • Create basic index.rst:
:ref:`--interface \--interface \-\-interface \\--interface \\-\\-interface  <some-ref>`
  • Run make html (ignore undefined label warning).

  • Visit index.html.

  • See –interface –interface –interface –interface –interface

Environment Information

sphinx==7.0.1

Sphinx extensions

No response

Additional context

No response

@picnixz
Copy link
Member

picnixz commented Jul 24, 2023

TL;DR: It seems that this works:

:ref:`\\\\-\\\\-interface <label>`

Because there is some unescape procedure at some point that gobbles one more level of escape. More precisely, if you look at sphinx.util.docutils.ReferenceRole, the input \\\\-\\\\-interface <label> internally parses

\x00\\x00\-\x00\\x00\-interface

and removes the NUL bytes from before \, storing its value in self.title to be \\-\\-interface. The doctree looks now like:

<paragraph>
  <pending_xref refdoc="index" refdomain="std" refexplicit="True" reftarget="label" reftype="ref" refwarn="True">
    <inline classes="xref std std-ref">
      \\-\\-interface

Then (it's done finished yet!), you apply sphinx.transforms.SphinxSmartQuotes inheriting SmartQuotes from docutils/transforms/universal.py that is responsible for transforming double dashes into en-dashes.

AFAICT, this transformation finds all nodes.TextElement and translates its content according to some pre-defined rules. However, when I debugged the flow, it appears that my node is actually processed twice. The reason is that docutils (and not Sphinx anymore) finds all nodes.TextElement nodes in the document and actually, the <paragraph> node containing the <pending_xref> and the <inline> node (also contained in <pending_xref>) are considered as two distinct nodes.

In particular, when you process the <paragraph> node, you also process the internal <inline> and get

<paragraph>
  <pending_xref refdoc="index" refdomain="std" refexplicit="True" reftarget="label" reftype="ref" refwarn="True">
    <inline classes="xref std std-ref">
      \-\-interface

Then, you process the <inline> node and get (finally)

<paragraph>
  <pending_xref refdoc="index" refdomain="std" refexplicit="True" reftarget="label" reftype="ref" refwarn="True">
    <inline classes="xref std std-ref">
      --interface

If you only have one level of escape, processing <paragraph> would already remove the backslashes and then you will have your unwanted en-dash.


I don't know if it is a flaw in the design of the SmartsQuote transformation of docutils but I think we can work towards correcting this. It should be natural that \- actually escapes the -. In particular, I think that we need to change the way the XRefRole parses the title and first transform occurrences of \-\-, \-\-\- and \.\.\. (namely all possible smart characters) into \x00\-\x00\- so that we never have to worry about the SmartsQuote transformation.

I think this logic should only apply to explicit titles. Currently, the explicit titles are unescaped, but I think we may simply keep them as they are. Unescaping them mean removing NUL bytes (which are coming from an escaping backslash in general \). So we should just change

self.title = unescape(matched.group(1))

into self.title = matched.group(1).

@bgusach
Copy link
Author

bgusach commented Jul 25, 2023

The \\\\-\\\\- trick works. Not a real solution, but I can live with that for a while. Thanks 👍️

@AA-Turner AA-Turner added this to the some future version milestone Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants