Skip to content

HTML Scanner doesn't unmangle html escapes #17903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Malien opened this issue May 6, 2025 · 1 comment
Closed

HTML Scanner doesn't unmangle html escapes #17903

Malien opened this issue May 6, 2025 · 1 comment
Assignees

Comments

@Malien
Copy link

Malien commented May 6, 2025

What version of Tailwind CSS are you using?

4.1.5

What build tool (or framework if it abstracts the build tool) are you using?

I'm importing tailwindcss and @tailwindcss/oxide directly in my little impromptu build framework.

What version of Node.js Bun are you using?

v1.2.11.

What browser are you using?

Latest builds of Chrome, Safari, Firefox as of 06.05.2025. It doesn't matter in this case.

What operating system are you using?

macOS 15.4.1

Reproduction URL

https://codesandbox.io/p/devbox/3t9fnw

In fact it's so small, I can throw it right here as well:

import { Scanner } from "@tailwindcss/oxide";

const scanner = new Scanner({});
const candidates = scanner.scanFiles([
  {
    content: `
        <html>
        <body class="group">
            <div class="group-has-[input[value=&quot;foo&quot;]:checked]:bg-red-500 bg-blue-500"

            <input name="groupped" value="foo">
            <input name="groupped" value="bar">
        </body>
        </html>
    `,
    extension: "html",
  },
]);

console.log(candidates);
// ^ should contain: group-has-[input[value="foo"]:checked]:bg-red-500
//              not: group-has-[input[value=&quot;foo&quot;]:checked]:bg-red-500

// Same applies to all other kinds of HTML escapes like unicode code points, etc.

Describe your issue

Given class="group-has-[input[value=&quot;foo&quot;]:checked]:bg-red-500" it leaves &quot; unescaped in the css output.

I encountered this edgecase while building a static site with react-dom's new renderToStaticMarkup. It turns out to escape everything it can, even if not strictly necessary, like single quotes in double quoted attributes and vice versa.

I think Scanner should first de-mangle html escapes when the extension: "html" is set in the @tailwindcss/oxides Scanner

@RobinMalfait RobinMalfait self-assigned this May 6, 2025
@RobinMalfait
Copy link
Member

Hey!

The scanner essentially treats all content files as plain text. We do have some pre-processors for specific templating languages (to make it scannable) but that's about it.

We don't run, parse or try and understand the source files and always pre-processing HTML files seems not ideal, because the HTML you typically write won't include those which means that there is an additional cost for everyone writing HTML files.

A similar issue exists in JS files (see last example: https://tailwindcss.com/docs/adding-custom-styles#handling-whitespace)

Think of it this way, we extract parts of whatever you pass in. If you pass in HTML encoded strings, we deliver the candidates including the encodings.

It also looks like you are manually passing strings to the scanner, so I would recommend to do the HTML decoding yourself there before passing it to the scanner.

We can revisit this if more people are running into this, but I don't think it's a good idea right now to pre-process all HTML files and replace those special characters since in "normal" HTML a human writes, those won't be there (unless they explicitly want that of course).

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants