Skip to content

remove superfluous leading/trailing whitespace #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 27, 2025

Conversation

chrispy-snps
Copy link
Collaborator

@chrispy-snps chrispy-snps commented Jan 27, 2025

Removes superfluous leading/trailing whitespace (EDITED to final behavior):

  • The BeautifulSoup conversion function now strips leading/trailing newlines from the final Markdown result by default.
    • This is controlled by the new strip_document configuration option, which defaults to STRIP.
  • Unit tests are run with strip_document == None so that leading/trailing separation newline behavior is tested.
  • The <p>, <caption>, and <figcaption> elements now always return the minimum required leading/trailing newlines, even when their child content contributes additional newlines.
  • The unit test for <figcaption> is renamed and moved.

In the convert__document_() function, I decided not to remove trailing newlines by default (LSTRIP) so that content can be appended to the result without ambiguity. However, removing both leading and trailing newlines is also worth considering.

This is proactive cleanup that will hopefully make #162 easier.

@chrispy-snps chrispy-snps requested a review from AlexVonB January 27, 2025 02:33
@chrispy-snps
Copy link
Collaborator Author

@AlexVonB - the unit tests no longer validate the expected leading newlines for each element. I am not sure I like this.

I am thinking of adding an option to control whether the final document-level newline stripping is performed, so that it can be disabled in the unit tests and the leading newlines can be checked. What do you think?

@chrispy-snps chrispy-snps force-pushed the chrispy/remove-extra-whitespace branch from 80239dc to dd90a52 Compare January 27, 2025 14:37
@chrispy-snps chrispy-snps force-pushed the chrispy/remove-extra-whitespace branch from dd90a52 to dd251c6 Compare January 27, 2025 15:03
@chrispy-snps
Copy link
Collaborator Author

After more thought, I made the document-stripping behavior configurable, then reverted the unit tests to their original behavior of testing leading/trailing newlines.

Copy link
Collaborator

@AlexVonB AlexVonB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently the conversion of <html><body><p>foo<p></body></html> returns \n\nfoo\n\n, correct? Then, removing the first two newlines seems sensible.

Be aware that making LSTRIP the default changes the current behavior. We should think about increasing the minor version instead just the patch (makes me think about going out of 0.x some time? :D). Maybe even keeping None as default could be a thing. Let me know what you think!

Thanks for your work, again! :)

@chrispy-snps
Copy link
Collaborator Author

@AlexVonB - yep, current behavior is newlines on both sides, which never really felt right to me:

md("<html><body><p>foo<p></body></html>")
# '\n\nfoo\n\n'

Given your openness to having a default of strip_document = STRIP (both sides), I am inclined to go with that too. Users can always configure this setting explicitly if they have an explicit need. I will make the change.

@chrispy-snps chrispy-snps force-pushed the chrispy/remove-extra-whitespace branch from c9470e0 to 1586f98 Compare January 27, 2025 16:39
@chrispy-snps
Copy link
Collaborator Author

@AlexVonB - can you give one final sanity check review please?

@AlexVonB
Copy link
Collaborator

Looks good! Thanks again!

@chrispy-snps chrispy-snps merged commit ae0597d into develop Jan 27, 2025
1 check passed
@chrispy-snps chrispy-snps deleted the chrispy/remove-extra-whitespace branch February 20, 2025 01:01
Wuhall pushed a commit to Wuhall/python-markdownify that referenced this pull request May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants