Why the range U+0320–U+03FF when computing spacing? #169

NSoiffer · 2022-09-26T17:59:37Z

This is separated out from #167 since the other issues are settled and it should be closed for CR.

Core says:

If Content is a single character in the range U+0320–U+03FF then exit with category Default.

That ranges makes no sense to me. It covers part of the combining chars and also the Greek/Coptic chars. I think maybe it is trying to capture the combining chars, but the combining chars range is U+0300 - U+036F. There are additional combining chars 1AB0–1AFF and 1DC0–1DFF that maybe should be included.

And from a later comment:

I still don't see why U+0320–U+03FF makes sense. Why are some combining chars included in the range and not others? Why is a Greek alpha treated different than a latin a? Although you (@fred-wang) don't need include text in the spec why this is so, it seems like a bug to me so you should explain why it isn't a bug.

fred-wang · 2022-09-27T05:16:18Z

U+0320–U+03FF are not part of the operator dictionary so they must return the default category. But as I previously mentioned that item 2. of https://w3c.github.io/mathml-core/#dfn-algorithm-to-determine-the-category-of-an-operator also remaps characters from Operators_2_ascii_chars inside this range (so they can be handled by the compact dictionary) and consequently this early return of the Default category is necessary. I'll add a WPT test to verify that, so that an implementer does not forget that step.

davidcarlisle · 2022-09-29T10:23:31Z

@fred-wang I think it's reasonable to ask though why that range, especially as it uses all the standard Greek code points. Why isn't a range from the Private use area used here, as it's just an internal mapping of the tables.

fred-wang · 2022-09-29T10:42:23Z

AFAIK, it still possible to use PUA characters in <mo> and they should have default spacing so not sure how that would help... And note that these values are transformed in step 3 to produce a key (code point + form) encoded on 14bits.

davidcarlisle · 2022-09-29T11:34:33Z

ah 14 bits 03FF which explains the range, which I guess answers @NSoiffer's question. Maybe we should say that so it doesn't look like we are ignoring Greek. I agree it makes no difference in practice as single letter Greek, like single letter Latin is never going to need an opdict entry so the slots are "free"

fred-wang · 2022-12-08T15:05:31Z

I'm not sure what's the next actionable step. AFAIK the text in the spec is correct and covered by tests.

NSoiffer · 2022-12-08T23:27:13Z

Choosing this range is cleaver but "random" (there are plenty of other ranges from other alphabets that I think could be used). I think an informative note (just one or two sentences similar to your comment) in the spec as to why this is done is appropriate. Specs should not have mysteries buried in them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the range U+0320–U+03FF when computing spacing? #169

Why the range U+0320–U+03FF when computing spacing? #169

NSoiffer commented Sep 26, 2022

fred-wang commented Sep 27, 2022

davidcarlisle commented Sep 29, 2022 •

edited

Loading

fred-wang commented Sep 29, 2022

davidcarlisle commented Sep 29, 2022

fred-wang commented Dec 8, 2022

NSoiffer commented Dec 8, 2022

Why the range U+0320–U+03FF when computing spacing? #169

Why the range U+0320–U+03FF when computing spacing? #169

Comments

NSoiffer commented Sep 26, 2022

fred-wang commented Sep 27, 2022

davidcarlisle commented Sep 29, 2022 • edited Loading

fred-wang commented Sep 29, 2022

davidcarlisle commented Sep 29, 2022

fred-wang commented Dec 8, 2022

NSoiffer commented Dec 8, 2022

davidcarlisle commented Sep 29, 2022 •

edited

Loading