Skip to content

resolve relative URLs in e-* html #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
willnorris opened this issue Aug 19, 2018 · 11 comments
Open

resolve relative URLs in e-* html #38

willnorris opened this issue Aug 19, 2018 · 11 comments

Comments

@willnorris
Copy link

The parsing spec does not currently include any special handling of URLs in the html value for e-* microformats. From http://microformats.org/wiki/microformats2-parsing#parsing_an_e-_property:

html: the innerHTML of the element by using the HTML spec: Serializing HTML Fragments algorithm, with leading/trailing whitespace removed.

However, some of the microformats tests do resolve relative URLs. See for example:

The major libraries are somewhat split on this. PHP and Ruby do resolve relative URLs. Go, Python, and Node do not resolve relative URLs.

Recent discussion in #microformats was inconclusive (though we didn't explore it too deeply)

At the very least, we need to synchronize the spec and the test cases. Personally, I'm leaning toward updating the spec to resolve relative URLs, since otherwise they are useless in any kind of embedding use-case, and they may not actually be able to be resolved, since you no longer have the <base> element.

@aaronpk
Copy link
Member

aaronpk commented Aug 19, 2018

The lack of the <base> element in the parsed result, as well as the fact that some parsers already do this, make me lean towards adding it to the spec as well.

@Zegnat
Copy link
Member

Zegnat commented Aug 20, 2018

I agree. My first reaction was to keep the HTML as close to the original as possible (including relative URLs), but the embedding use-case won me over! The relative URLs cannot stand on their own in the HTML fragment, as we would have lost meaning. Resolving them makes a lot of sense!

@Zegnat
Copy link
Member

Zegnat commented Aug 20, 2018

Because these issues tend to grow stale if there is no concrete proposal, here is one.

I propose replacing this line in the parsing specification for e-*:

With:

The attributes table in the HTML specification is one of the few resources that tells us where URLs are, without having to leave the parser implementer guessing about what strings in the HTML fragment constitute relative URLs. This came out of a discussion with regards to the Webmention spec (w3c/webmention#91).

The “… following the containing document’s language’s rules …” phrasing is taken straight from other places where we resolve URLs.

I have kept the “with” phrasing because I am not sure how to best sum up the actions that need to be taken by the parser. Most implementers will probably first resolve the URLs, then run the serialisation algorithm, and then remove whitespace. That’s a slightly different order from what this proposal suggests. Let me know if there are better ways to phrase this.

@sknebel
Copy link
Member

sknebel commented Aug 20, 2018

Looks good.

Can anyone think of use-cases that would prefer URLs not to be normalized? Theoretical example I can come up with: recovery of content from a site with mf2 markup, where relative URLs to pictures, other posts, ... might be preferred.

If that's an issue, the parser should at least return the found base URL for the page, so later steps can resolve if they want. In the majority of cases, the HTML from the e-* properties has to be postprocessed anyways (filtering safe tags, replacing images with proxied versions, ...), and resolving would then add "just" another step to that.

@gRegorLove
Copy link
Member

Sounds good. I'm +1 on @Zegnat's proposal.

@willnorris
Copy link
Author

willnorris commented Aug 23, 2018

@Zegnat's proposal has been implemented in the go parser, though it is not currently enabled.

willnorris added a commit to willnorris/microformats that referenced this issue Aug 23, 2018
willnorris added a commit to willnorris/microformats that referenced this issue Aug 23, 2018
microformats/microformats2-parsing#38 hasn't been fully adopted yet, but
it seems like it will, and two of the tests already assume this kind of
url expansion to be the case.
@snarfed
Copy link
Member

snarfed commented Jan 30, 2023

This came up recently in snarfed/bridgy-fed#390 (comment). php-mf2 currently resolves relative URLs in e-*, mf2py doesn't. Shall we try again to get @Zegnat's proposal ^ into the spec? Also, mf2py and other parser contributors, any thoughts? @sknebel @tommorris @angelogladding

@jgarber623
Copy link
Member

MicroMicro resolves relative URLs within e-* properties. I don’t remember building it that way for any other reason than to pass the microformats2 test suite.

@sknebel
Copy link
Member

sknebel commented Nov 15, 2023

mf2py has this ready to go (and thus I guess votes in favor of this change)

@jalcine
Copy link
Member

jalcine commented Nov 15, 2023

This is supported by the Rust Microformats parser and is demonstrated in the tested documentation, as this is done generically with plain text.

(Originally published at: https://jacky.wtf/2023/11/u1y4)

@jalcine
Copy link
Member

jalcine commented Nov 15, 2023

I'm also in favor of this change!

(Originally published at: https://jacky.wtf/2023/11/JeNH)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants