Skip to content

feat: implemented JSX support as text/jsx+xml #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kyr0
Copy link

@kyr0 kyr0 commented Nov 14, 2021

Hi Andrea,

...hope you're doing well! I remember reading your emails on the ES mailing list years ago :)

Well... fast-forward; a few days ago I discovered your beautiful DOM implementation here and I really love it. It's so elegant! Since I'm working on a new SSG concept, I use it to parse and (post-)process markup for static site generation. As my tool supports JSX/TSX as the primary input templating format, using linkedom showed extremely well performance and a beautiful API but of course, JSX/TSX support is still lacking in this lib and it caused some trouble for full JSX/TSX feature support -- until today :)

So, I've just implemented support for that; text/jsx+xml:

  • basically behaves like XML (as by the nature of JSX/TSX)
  • but doesn't escape HTML entities by default. I want to pass the JSX/TSX directly into a transpiler for further transformation; turning syntax elements like <, > etc. into entities, gives hickups here as the grammar doesn't support that obviously
  • and don't quote attribute values which are JSON-like formatted. e.g. <foo bar={callFn()} /> -- of course, wrapping it in quotes would lead to wrong interpretation in further parse steps in the transpiler
  • and don't add the XML doctype on top of document.toString()

For the implementation, I just followed the code style and architecture given. Another mime type has been added and you'll find some additional branching logic in certain toString() implementations.

The reason why I introduced the escapeHtmlEntities flag to all mime types and not inverting that control logic was, that the assumption to activate the feature by default, seems logical to me - so it's more like: "you can deactivate it, if really needed". However, I think this is a flag the user might want to have more flexible control over. I had the idea to pass an optional argument to the parseFromString() so that details could be controlled more precisely (overriding default mime type config); however I think that this would be a change worth another PR -- and I also actually don't really need this right now.

I've also added all necessary tests. You'll find the PR to build with 100% test coverage, checking for all changed/added behavior as well.

I'd love if you'd find time to review and maybe merge this one + release a new version.
I've the package linked locally with my changes, but would love to release an alpha version of my SSG concept soon -- but of course I don't want to fork/release this lib for that.

Thank you in advance!
Aron

@@ -40,9 +40,10 @@ export class Attr extends Node {
}

toString() {
const {name, [VALUE]: value} = this;
const {ownerDocument, name, [VALUE]: value} = this;
const doubleQuote = ownerDocument[MIME].unquotedJsonAttributes && /^\{(.[\s\S]?)+\}$/.test(value) ? '' : '"'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is /^\{(.[\s\S]?)+\}$/ for? if it's about starting with { and ending with } I believe /^\{[\s\S]*\}$/ is what you are after?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll try that; should be more precise than my code there. It was late when I wrote that RegExp :)

return emptyAttributes.has(name) && !value ?
name : `${name}="${value.replace(QUOTE, '&quot;')}"`;
name : `${name}=${doubleQuote}${value.replace(QUOTE, doubleQuote ? '&quot;' : '"')}${doubleQuote}`;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have instead:

const useQuote = ownerDocument[MIME].unquotedJsonAttributes && /^\{(.[\s\S]?)+\}$/.test(value);
const quote = useQuote ? '"' : '';

// ...

`${name}=${quote}${value.replace(QUOTE, useQuote ? '&quot;' : '"')}${quote}`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is def. more readable. Good point.

docType: '<?xml version="1.0" encoding="utf-8"?>',
ignoreCase: false,
escapeHtmlEntities: false,
unquotedJsonAttributes: true,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things I'd rather change here, or discuss:

  1. I don't like checking emptiness ... all objects here are homogeneous, if we add a single boolean field somewhere I want it to be explicitly boolean field somewhere else too, as it is for escapeHtmlEntities ... so, I'd add unquotedJsonAttributes: flase in other cases
  2. I don't unrestand why it's called unquotedJsonAttributes ... I understand it's virtually JSON, but practically that's an unquoted attribute ... so ... since we want to allow only JSON which results into being unquoted, I think a field name such as jsonAttributes would better reflect the intent/purpose? ... 'cause of course those gotta be unquoted, otherwise it'd be invalid JSON, does it make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. agreed, that makes sense to me
  2. I think that jsonAttributes would describe the purpose what should be achieved, while unquotedJsonAttributes is describing what it is happening. I like both ways of naming. Lets go with jsonAttributes then

@WebReflection
Copy link
Owner

@kyr0 beside my comments as review, I am not sure I like where this is going or, to be more precise, I don't understand what is this adding, compared to jsx2tag, which allows real JSX out of the box, without really needing to change much.

Most notably, I don't understand how you would address <LocalComponentToCall /> there, or any reference within {brackets} ... do you mind providing a meta/idea of what you are trying to solve?

Thank you!

@kyr0
Copy link
Author

kyr0 commented Nov 16, 2021

@WebReflection I see. First of all, thank you for taking so much time to review, give me good feedback on my code and pointing me to potential alternatives! Very much appreciated!

So, what I'm doing is pre-processing JSX/HTML markup before transpiling it in a second step. The pre-processing step mutates elements and its CDATA based on certain rules; especially <script> and <style> CDATA is mutated. Therefore I need a parser that is capable of parsing JSX/HTML and a DOM manipulation library that is capable of handling that use-case as well and that is able to re-generate a markup from that.

As htmlparser2 is already doing a great job here because the parser is "forgiving", and because linkedom is a great DOM API lib, and also both seem to be general-purpose projects trying to solve either a specific problem like parsing and lightweight DOM API -- these seemed to be perfectly suited for my use-case.

JSX2TAG is a nice project as well. From what I can see, you're solving a similar but not the same problem there.
JSX2TAG supports JSX while I'd need something like JSXandHTML2TAG. The template grammar I'm working on is a hybrid. It supports something like that:

<style>
    body { background-color: '#000' }
</style>

as well as <script> in the middle of JSX/TSX, also HTML comments etc..

Of course, there is no need to address that specifically in linkedom. linkedom is handling JSX quite well by default already, except the unqouted attribute values to be re-generated in the same way as they were read. No need to put any additional logic for my specific case. Any special logic for my hybrid grammar is handled in my project of course. In the libs, everything should follow the defined standards (well, subjective opinion).

I think having JSX/TSX support for linkedom in that way is a nice feature for the library that fits in very well to the existing supported XML(-like) grammar variations. Of course this is just my opinion and subjective.

@WebReflection
Copy link
Owner

@kyr0 do you have any example of how this would actually work? The test just shows that JSX can be outputted, but what would you do/use that output? Anyway, because the mime is a non-existent one, I don't have particular problems in enabling this use case too, I just need to approve those changes and see if there's any relevant performance issue, but on a quick look it doesn't look like there should be one.

@kyr0
Copy link
Author

kyr0 commented Nov 17, 2021

@WebReflection Thank you. Well, I pre-filter the HTML comments; then, using linkedom I numb the <script> and <style> #text by wrapping in {``}. For that, I was using linkedom in step (1). After that comes the part of transpiling and evaluating the code (SSG).

One thing that happens in the evaluation is that a custom jsxFactory function is called which re-creates a DOM structure from objects being created via the typical functional tree transform; this happens again, by using linkedom API (2). Finally I do some post-processing with the help of linkedom and print the HTML markup; also using linkedom (3).

Well; yesterday night, I was re-thinking my implementation of step (1) because I found some issues with more complex JSX use-cases in the htmlparser2 library upstream. At some points htmlparser2 isn't behaving JSX grammar compliant anymore (of course because it was never built with the intention to deal with that grammar). So I forked it too, fixed it there as well -- added JSX support.

Locally I have the test running almost 100% green now and am almost ready for PR; but there is still a bit of complexity to solve; well, its quite a patch on htmlparser2 as well and idk how Felix will deal with that. Without upstream support on JSX in htmlparser2, this patch wouldn't be able to handle complex cases, which it should I think. I'll update my tests and the changes discussed to reflect that.

But now, after this introduces so much change in the ecosystem I'm asking myself if it is really still wanted by everyone involved?

Let's see -- I'll open a PR in htmlparser2 library with my changes of course so they Felix can decide about that.

Until then, I also have an alternative solution for myself, as I rewrote my code for step (1) last night, replacing linkedom there by a custom "good-enough" tokenizer/parser/transformer that is very limited to my narrow use case but as such, capable to deal with even the most complex JSX cases. It's possible because I don't have to care for general-purpose standard compliance etc.

So... I think it would still be nice to have JSX support in linkedom, but it requires change in htmlparser2 and a dependency version bump later as well. I'll open that PR there and update this one today/tomorrow anyway...

@WebReflection
Copy link
Owner

@kyr0 fair enough, and thanks for the update. I'll wait for your changes/future updates then, and keep this open for the time being.

@WebReflection WebReflection marked this pull request as draft November 17, 2021 09:54
@WebReflection WebReflection self-assigned this Nov 17, 2021
@WebReflection WebReflection added 3rd Party Issue from external dependencies enhancement New feature or request labels Nov 17, 2021
@WebReflection
Copy link
Owner

I am not against JSX but this feels a bit stale and with conflicts ... I'd love to see an updated version of this PR if possible, otherwise I will close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd Party Issue from external dependencies enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants