Menu

#213 Change section handling to not rely on exceptions and reparsing

None
open-fixed
nobody
None
5
2025-04-25
2025-03-26
No

While waiting for responses to an email I sent to the -users I decided to dig further into the issue and ended up implementing a fix. To recap my issue, I was having problems with a custom directive's nested parse of its content when section headers were used, because the existing code throws an EOFError up the chain to find the correct place to attach a new section.

The proposed patch changes this by adding an explicit stack of sections to the memo object instead of using the Python call stack, and changing RSTState.section() to unwind this state in-place, instead of using exception throwing to find the right place. In fact, I think this is the kind of implementation suggested as an alternative in a comment in the existing code. This also lets us get rid of memo.section_bubble_up_kludge and Line.eofcheck, which only existed to handle the exceptions.

In addition to the changed code, I've had to make six changes to the test suite.

  • Three diagnostics in test/functional/expected/standalone_rst_pseudoxml.txt now have line numbers relative to the beginning of the file rather than some other element. I think because we no longer initiate a nested parse for sections.
  • Two diagnostics in test/test_parsers/test_rst/test_targets.py have different line numbers, indicating the second conflicting label, rather than the implicit label generated by the section. I think the reason is that due to the exception throwing approach, the existing code handled the section label later than the other, even if the section is earlier in the text.
  • test/test_parsers/test_rst/test_section_headers.py has one diagnostic output less. I think this diagnostic is a duplicate, caused by the rethrowing of the exception for bubbling up the stack.
1 Attachments

Related

Bugs: #346

Discussion

  • Arne Skjærholt

    Arne Skjærholt - 2025-03-31

    On further reflection, my patch doesn't fully fix the problems involved. I can now parse section headers in a nested parse in my directive, but paragraphs following my directive get attached wrong in the tree because the parsing state of the parent parse doesn't get properly synchronized with the new state arising in the nested parse. But for cases not involving nested parses I still think it's correct.

     
  • Günter Milde

    Günter Milde - 2025-03-31

    Thank you for the patch. On a first inspection it looks like a genuine improvement.

    The changes to the test suite seem to be improvements/fixes, too.
    Line numbers in test/functional/expected/standalone_rst_pseudoxml.txt are still incorrect (but nearer to the correct value). This needs to be looked at separately (the <enumerated_list> element is missing source/line internal attributes, too).

    We will have to look into your use-case to see whether there is still something missing in the patch or whether the problem can/must be fixed in your custom directive.

     
    • Günter Milde

      Günter Milde - 2025-04-17

      Line numbers in rst_pseudoxml.txt are fixed in [r10078].

       

      Related

      Commit: [r10078]

  • Günter Milde

    Günter Milde - 2025-04-05

    The patch also solves [bugs:#346] "reST parser warns twice for short underline".

     

    Related

    Bugs: #346

  • Arne Skjærholt

    Arne Skjærholt - 2025-04-05

    Yeah, I think the parser is more straightforward this way.

    I don't think my approach is really salvageable as long as the nested parse logic creates a new state machine (I have some idea of how it could be made to work, but it's basically a completely different approach to the entire parser...). Thankfully I was able to find an alternative approach to my case: instead of reparsing n times, I inject n copies of the directive contents into the parser input, with each copy wrapped by simpler state setting/state clearing directives that mimic what happend around each nested parse in the original implementation. In some ways it's actually a simpler solution.

     
  • Günter Milde

    Günter Milde - 2025-04-09

    How about the attached version?

    • Keep check_subsection() and new_subsection() functions for backwards compatibility.
    • Restructure and simplify code.
    • No change in test/functional/expected/standalone_rst_pseudoxml.txt because commit [r10078] fixed the line numbers of "start value not ordinal-1" INFO messages.
     

    Related

    Commit: [r10078]


    Last edit: Günter Milde 2025-04-09
  • Günter Milde

    Günter Milde - 2025-04-17

    An updated version of the patch is applied in [r10093].
    Thank you for your patch.

     

    Related

    Commit: [r10093]

  • Günter Milde

    Günter Milde - 2025-04-25
    • status: open --> open-fixed
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.