Skip to content

[css-ruby-1] Should auto-hide match use NFKC and/or strip white space? #5995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fantasai opened this issue Feb 15, 2021 · 5 comments
Open
Labels
css-ruby-1 Current Work i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Needs Edits

Comments

@fantasai
Copy link
Collaborator

https://lists.w3.org/Archives/Public/www-style/2016Dec/0108.html raised some use cases for ruby auto-hiding other than strict string equality. Many of the examples would require custom rules (which could be done manually with visibility: collapse), but some of these could be automatically solved by stripping white space and/or matching via NFKC. Should we enable such normalization for auto-hiding string comparison?

@fantasai fantasai added the css-ruby-1 Current Work label Feb 15, 2021
@r12a r12a added the i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. label Feb 15, 2021
@patrickdark
Copy link
Contributor

I expanded upon the cited email at #5927 (comment).

Even though it wouldn't solve all of the cited issues, I think Unicode normalization is probably a good idea. It would address the fullwidth versus normal width punctuation cases between CJK and Latin languages where I would expect a match (and autohiding). I also would expect Hangul characters built from component characters (Hangul Jamo) to match the precomposed versions, though I'm not sure where that would occur in practice.

It would also allow matching in a weird case I ran into where I deliberately used a combining diacritic plus base letter variant of a character instead of the precomposed diacritic-plus-letter version because the latter was missing in a designer font I was using, so the former looked better. In this case, while I was using ruby, I didn't have to match anything, but would nevertheless expect a match in a comparison to the precomposed character.

@fantasai
Copy link
Collaborator Author

fantasai commented Oct 11, 2022

I can't find the minutes, so maybe @r12a or @aphillips or @frivoal can confirm if this discussion I remember actually happened. :) But IIRC the i18nWG concluded that NFKC would be too aggressive in at least some cases, but wanted to know if the CSSWG would consider NFC and/or ignoring white space.

@aphillips
Copy link
Contributor

@fantasai It's here in our TPAC minutes--almost exactly as you remember it :-). Search for the word "hide" and the conversation proceeds from there.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed [css-ruby-1] Should auto-hide match use NFKC and/or strip white space?, and agreed to the following:

  • RESOLVED: only perform whitespace stripping before comparing the base and annotation texts
The full IRC log of that discussion <fremy> fantasai: we have a feature in ruby where if the annotated text and the base are identical if they are presented on top of each other
<fremy> fantasai: but if they are side by side, they are kept for example
<fremy> fantasai: the question is "what is identical"?
<fremy> fantasai: should we normalize before doing this?
<fremy> fantasai: should we deal with white space
<fremy> fantasai: should we collapse unicode characters that merge in rendering if possible? (NFKC)
<fremy> fantasai: but the internationalization group thought it might be too aggressive in some cases
<fremy> fantasai: they recommended NFC instead
<TabAtkins> q+
<fremy> fantasai: which only deal with things that are simpler (e.g. A + an accent vs A accent)
<florian> q+
<fremy> fantasai: so, do we want to perform NFC before comparing the texts?
<astearns> ack TabAtkins
<fremy> TabAtkins: I support whitespace stripping
<fremy> TabAtkins: because it can be due to source code formatting
<fremy> TabAtkins: but I don't think we should do NFC because we don't do this elsewhere
<fremy> TabAtkins: I expect that authors use the same typing convention in the same markup
<fremy> TabAtkins: we are not comparing html vs css
<astearns> ack florian
<fremy> florian: I agree about whitespace
<fremy> florian: for normalization, I'm less sure
<fremy> florian: if one persons types the text, and another the annotations
<fremy> florian: NFC is not very aggressive, I think it would make things more rational for users
<fremy> florian: however, it will be rare I think
<fremy> florian: but if it did occur, I think the correct behavior is to normalize
<fremy> florian: (so, preference for NFC, but not strong)
<jfkthame> +1 to nfc
<fremy> astearns: can we resolve on stripping whitespace, and leave off normalization?
<heycam> q+
<fremy> fantasai: I think yes, I agree with TabAtkins, we don't do it elsewhere
<fremy> fantasai: so it seems ok to drop this
<astearns> ack heycam
<fremy> heycam: this is just a content check, correct?
<fremy> heycam: we don't look at display:none etc... ?
<fremy> fantasai: we might be looking at display:none?
<fantasai> s/TabAtkins, we don't do it elsewhere/TabAtkins and Florian: it's definitely the right thing to do, but it's also not done elsewhere in the platform and is quite rare to mismatch/
<fremy> florian: but not generated content etc
<astearns> jfkthame: would you be OK not doing NFC, or would you prefer we resolve to use NFC?
<fremy> heycam: okay, hopefully the spec is very clear on that
<fremy> astearns: reading IRC comments
<fantasai> [note: those of us on the call are somewhat ambivalent about NFC, given pros and cons]
<jfkthame> astearns: I'd be ok with not, though I think it's less good (sorry, in another meeting)
<heycam> (I kind of don't quite understand the need for this automatic hiding, and why the author doesn't use visibility:hidden on ruby text that they know is the same as the base text)
<fremy> astearns: okay, since we have lots of doubts on NFC, let's just do whitespace and leave if at that
<fantasai> heycam, it's because whether it should be invisible or not depends on how it's styled
<fremy> florian: and also put an action on me to clarify the display:none behavior
<fantasai> heycam, and there's no selector for "this is the same text as the other thing" :)
<heycam> ok
<fantasai> heycam, plus it's what you want by default so we should do it by default
<fremy> astearns: so, the proposed resolution would be to only perform whitespace stripping before comparing the base and annotation texts
<fremy> astearns: any objection?
<fremy> RESOLVED: only perform whitespace stripping before comparing the base and annotation texts
<heycam> text-transform? :o
<fantasai> ”The content comparison for auto-hiding takes place prior to white space collapsing (white-space) and text transformation (text-transform) and ignores elements (considers only the textContent of the boxes).
<fremy> ACTION florian: make sure the way to determine what text we are talking about (display:none, etc...)
<fantasai> ”

@aphillips
Copy link
Contributor

(responding to the IRC log discussion in the comment above)

Note that I18N spent a long time creating a document about string matching Charmod-Norm. When specifying string matching or when considering what to apply, consider referencing the best practices found there. In particular, I18N recommends against performing Unicode normalization for most matching regimes. I think our previous half-hearted recommendation to look at NFC for ruby base matching came out of a TPAC discussion in which NFKC was being considered. But upon reflection, if the base and ruby text were not encoded the same except under NFC, treating them as different would be unsurprising (and represents a pretty rare corner case in any event--the only case that springs to mind might be the handling of dakuten marks in Japanese, which are sometimes combining, but even then the difference might be intentional??)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
css-ruby-1 Current Work i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. Needs Edits
Projects
None yet
Development

No branches or pull requests

6 participants