HTML `Scanner` doesn't unmangle html escapes #17903

Malien · 2025-05-06T20:48:14Z

What version of Tailwind CSS are you using?

4.1.5

What build tool (or framework if it abstracts the build tool) are you using?

I'm importing tailwindcss and @tailwindcss/oxide directly in my little impromptu build framework.

What version of ~~Node.js~~ Bun are you using?

v1.2.11.

What browser are you using?

Latest builds of Chrome, Safari, Firefox as of 06.05.2025. It doesn't matter in this case.

What operating system are you using?

macOS 15.4.1

Reproduction URL

https://codesandbox.io/p/devbox/3t9fnw

In fact it's so small, I can throw it right here as well:

import { Scanner } from "@tailwindcss/oxide";

const scanner = new Scanner({});
const candidates = scanner.scanFiles([
  {
    content: `
        <html>
        <body class="group">
            <div class="group-has-[input[value=&quot;foo&quot;]:checked]:bg-red-500 bg-blue-500"

            <input name="groupped" value="foo">
            <input name="groupped" value="bar">
        </body>
        </html>
    `,
    extension: "html",
  },
]);

console.log(candidates);
// ^ should contain: group-has-[input[value="foo"]:checked]:bg-red-500
//              not: group-has-[input[value=&quot;foo&quot;]:checked]:bg-red-500

// Same applies to all other kinds of HTML escapes like unicode code points, etc.

Describe your issue

Given class="group-has-[input[value="foo"]:checked]:bg-red-500" it leaves " unescaped in the css output.

I encountered this edgecase while building a static site with react-dom's new renderToStaticMarkup. It turns out to escape everything it can, even if not strictly necessary, like single quotes in double quoted attributes and vice versa.

I think Scanner should first de-mangle html escapes when the extension: "html" is set in the @tailwindcss/oxides Scanner

The text was updated successfully, but these errors were encountered:

RobinMalfait · 2025-05-06T21:09:47Z

Hey!

The scanner essentially treats all content files as plain text. We do have some pre-processors for specific templating languages (to make it scannable) but that's about it.

We don't run, parse or try and understand the source files and always pre-processing HTML files seems not ideal, because the HTML you typically write won't include those which means that there is an additional cost for everyone writing HTML files.

A similar issue exists in JS files (see last example: https://tailwindcss.com/docs/adding-custom-styles#handling-whitespace)

Think of it this way, we extract parts of whatever you pass in. If you pass in HTML encoded strings, we deliver the candidates including the encodings.

It also looks like you are manually passing strings to the scanner, so I would recommend to do the HTML decoding yourself there before passing it to the scanner.

We can revisit this if more people are running into this, but I don't think it's a good idea right now to pre-process all HTML files and replace those special characters since in "normal" HTML a human writes, those won't be there (unless they explicitly want that of course).

Hope this helps!

RobinMalfait self-assigned this May 6, 2025

RobinMalfait closed this as completed May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML `Scanner` doesn't unmangle html escapes #17903

HTML `Scanner` doesn't unmangle html escapes #17903

Malien commented May 6, 2025 •

edited

Loading

RobinMalfait commented May 6, 2025

HTML Scanner doesn't unmangle html escapes #17903

HTML Scanner doesn't unmangle html escapes #17903

Comments

Malien commented May 6, 2025 • edited Loading

RobinMalfait commented May 6, 2025

HTML `Scanner` doesn't unmangle html escapes #17903

HTML `Scanner` doesn't unmangle html escapes #17903

Malien commented May 6, 2025 •

edited

Loading