Skip to content

Why the range U+0320–U+03FF when computing spacing? #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
NSoiffer opened this issue Sep 26, 2022 · 6 comments
Open

Why the range U+0320–U+03FF when computing spacing? #169

NSoiffer opened this issue Sep 26, 2022 · 6 comments

Comments

@NSoiffer
Copy link
Contributor

This is separated out from #167 since the other issues are settled and it should be closed for CR.

Core says:

If Content is a single character in the range U+0320–U+03FF then exit with category Default.

That ranges makes no sense to me. It covers part of the combining chars and also the Greek/Coptic chars. I think maybe it is trying to capture the combining chars, but the combining chars range is U+0300 - U+036F. There are additional combining chars 1AB0–1AFF and 1DC0–1DFF that maybe should be included.

And from a later comment:

I still don't see why U+0320–U+03FF makes sense. Why are some combining chars included in the range and not others? Why is a Greek alpha treated different than a latin a? Although you (@fred-wang) don't need include text in the spec why this is so, it seems like a bug to me so you should explain why it isn't a bug.

@fred-wang
Copy link
Contributor

U+0320–U+03FF are not part of the operator dictionary so they must return the default category. But as I previously mentioned that item 2. of https://w3c.github.io/mathml-core/#dfn-algorithm-to-determine-the-category-of-an-operator also remaps characters from Operators_2_ascii_chars inside this range (so they can be handled by the compact dictionary) and consequently this early return of the Default category is necessary. I'll add a WPT test to verify that, so that an implementer does not forget that step.

@davidcarlisle
Copy link
Collaborator

davidcarlisle commented Sep 29, 2022

@fred-wang I think it's reasonable to ask though why that range, especially as it uses all the standard Greek code points. Why isn't a range from the Private use area used here, as it's just an internal mapping of the tables.

@fred-wang
Copy link
Contributor

AFAIK, it still possible to use PUA characters in <mo> and they should have default spacing so not sure how that would help... And note that these values are transformed in step 3 to produce a key (code point + form) encoded on 14bits.

@davidcarlisle
Copy link
Collaborator

ah 14 bits 03FF which explains the range, which I guess answers @NSoiffer's question. Maybe we should say that so it doesn't look like we are ignoring Greek. I agree it makes no difference in practice as single letter Greek, like single letter Latin is never going to need an opdict entry so the slots are "free"

@fred-wang
Copy link
Contributor

I'm not sure what's the next actionable step. AFAIK the text in the spec is correct and covered by tests.

@NSoiffer
Copy link
Contributor Author

NSoiffer commented Dec 8, 2022

Choosing this range is cleaver but "random" (there are plenty of other ranges from other alphabets that I think could be used). I think an informative note (just one or two sentences similar to your comment) in the spec as to why this is done is appropriate. Specs should not have mysteries buried in them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants