-
Notifications
You must be signed in to change notification settings - Fork 6
resolve relative URLs in e-* html #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The lack of the |
I agree. My first reaction was to keep the HTML as close to the original as possible (including relative URLs), but the embedding use-case won me over! The relative URLs cannot stand on their own in the HTML fragment, as we would have lost meaning. Resolving them makes a lot of sense! |
Because these issues tend to grow stale if there is no concrete proposal, here is one. I propose replacing this line in the parsing specification for
With:
The attributes table in the HTML specification is one of the few resources that tells us where URLs are, without having to leave the parser implementer guessing about what strings in the HTML fragment constitute relative URLs. This came out of a discussion with regards to the Webmention spec (w3c/webmention#91). The “… following the containing document’s language’s rules …” phrasing is taken straight from other places where we resolve URLs. I have kept the “with” phrasing because I am not sure how to best sum up the actions that need to be taken by the parser. Most implementers will probably first resolve the URLs, then run the serialisation algorithm, and then remove whitespace. That’s a slightly different order from what this proposal suggests. Let me know if there are better ways to phrase this. |
Looks good. Can anyone think of use-cases that would prefer URLs not to be normalized? Theoretical example I can come up with: recovery of content from a site with mf2 markup, where relative URLs to pictures, other posts, ... might be preferred. If that's an issue, the parser should at least return the found base URL for the page, so later steps can resolve if they want. In the majority of cases, the HTML from the |
Sounds good. I'm +1 on @Zegnat's proposal. |
@Zegnat's proposal has been implemented in the go parser, though it is not currently enabled. |
fully implement recommendation in microformats/microformats2-parsing#38
microformats/microformats2-parsing#38 hasn't been fully adopted yet, but it seems like it will, and two of the tests already assume this kind of url expansion to be the case.
This came up recently in snarfed/bridgy-fed#390 (comment). php-mf2 currently resolves relative URLs in |
MicroMicro resolves relative URLs within |
mf2py has this ready to go (and thus I guess votes in favor of this change) |
This is supported by the Rust Microformats parser and is demonstrated in the tested documentation, as this is done generically with plain text. (Originally published at: https://jacky.wtf/2023/11/u1y4) |
I'm also in favor of this change! (Originally published at: https://jacky.wtf/2023/11/JeNH) |
The parsing spec does not currently include any special handling of URLs in the
html
value for e-* microformats. From http://microformats.org/wiki/microformats2-parsing#parsing_an_e-_property:However, some of the microformats tests do resolve relative URLs. See for example:
The major libraries are somewhat split on this. PHP and Ruby do resolve relative URLs. Go, Python, and Node do not resolve relative URLs.
Recent discussion in #microformats was inconclusive (though we didn't explore it too deeply)
At the very least, we need to synchronize the spec and the test cases. Personally, I'm leaning toward updating the spec to resolve relative URLs, since otherwise they are useless in any kind of embedding use-case, and they may not actually be able to be resolved, since you no longer have the
<base>
element.The text was updated successfully, but these errors were encountered: