Skip to content

[css-pseudo] Consider using Unicode ZWJ and ZWNJ to control :first-letter inclusion #6242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
faceless2 opened this issue Apr 26, 2021 · 3 comments
Labels
css-pseudo-4 Current Work i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@faceless2
Copy link

I'm proposing we introduce control over which letters are considered part of the first-letter by using Unicode joiners, by specifying that the first-letter pseudo-element must not break at a ZWJ and must break at ZWNJ.

That would allow us to support the example in #3208 - two initial "V" letters forming an archaic "W", which could be represented as V‍V. It would also be a simpler way of solving some - although not all - of the various use-cases raised in #2040. I understand the requirements raised in that issue, but the solution is quite complex. Offering a quick and easy way of solving several of those cases with markup might be easier to understand for authors.

ZWJ/ZWNJ are currently not mentioned in this area of the spec at all, but it's acknowledged that the first letter might be more than a single base character - the Dutch "IJ" ligature are given as an example. Cases where ZWJ or ZWNJ might already exist in this context are where the first letter is emoji or from the arabic family (also theoretically seen in Hangul). The intent is to build a single typographic unit from multiple codepoints, which would all be part of the first letter if it applied. So there's no compat issue here that I can see.

Finally, and by far the most important, it should be very easy to implement. The existing logic that scans the start of the text for punctuation to determine where the first letter ends would just need adjusting to add tests for ZWJ and ZWNJ as well.

(originally an idea from #3208)

@faceless2 faceless2 added the css-pseudo-4 Current Work label Apr 26, 2021
@fantasai fantasai added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Dec 29, 2022
@r12a
Copy link
Contributor

r12a commented Jan 18, 2023

I see these problems with this approach:

  1. It changes the semantic of the character sequence for presentational purposes that have nothing to do with the character sequence itself.
  2. It provides another way of achieving the first-letter segmentation when there are already two (::first-letter and markup).
  3. It may interfere with the expected behaviour of ZWNJ as outlined in the Unicode standard.

An example of the latter is the use of ZWNJ in Bengali to produce an alternate ligature for CV combinations. Even though a ZWNJ occurs between the two characters, they must not be split by first-letter styling.

For an example, see https://r12a.github.io/scripts/beng/bn.html#vowelligatures (currently figure 13).

If the text is to be manipulated, then i think it makes more sense to use markup to apply the presentation.

@faceless2
Copy link
Author

Thank you, I wasn't aware of the use in Bengali. I think that effectively prevents the idea of using of ZWNJ to force the end of the first-letter segment. I'm not sure it rules out using ZWJ to add extra letters, which does

Re your other points, I agree with your first certainly. For the second we still have no control over what's included and what isn't - it's determined by the pattern defined in https://w3c.github.io/csswg-drafts/css-pseudo-4/#first-letter-pattern.

However now we have ::prefix and ::postfix to style the punctuation, this is issue much less important - the only case this would fix is the one from #3208 - two initial "V" letters forming an archaic "W" - but that's very obscure.

So given your objections and the existance of ::prefix/::postfix, I don't think this issue is going to solve much - so I'll close.

@r12a
Copy link
Contributor

r12a commented Jan 19, 2023

fwiw, Khmer has another example of ZWNJ appearing inside a cluster which mustn't be split for display. See the end of the subsection Register-shifter position at https://r12a.github.io/scripts/khmr/km.html#consonant_shift_posn where ZWNJ changes the default position of the consonant shifter. (currently also figure 13)

And another just below wrt managing ligation at https://r12a.github.io/scripts/khmr/km.html#vowel_ligatures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
css-pseudo-4 Current Work i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests

3 participants