Skip to content

Operator dictionary: Order of preference for forms #167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fred-wang opened this issue Sep 20, 2022 · 10 comments
Closed

Operator dictionary: Order of preference for forms #167

fred-wang opened this issue Sep 20, 2022 · 10 comments

Comments

@fred-wang
Copy link
Contributor

From MathML full ( https://w3c.github.io/mathml/#presm_formdefval ):

If the operator does not occur in the dictionary with the specified form, the renderer should use one of the forms that is available there, in the order of preference: infix, postfix, prefix

However, MathML Core tries prefix before postfix:

Set Category to the result of the algorithm to determine the category of an operator (Content, Form) where Form is infix.
If Category is Default, then run the algorithm again with Form set to prefix.
If Category is Default, then run the algorithm again with Form set to postfix.

I believe this is a mistake.

@fred-wang fred-wang changed the title Operator dictionary: Order of preference when looking for forms Operator dictionary: Order of preference for forms Sep 20, 2022
@davidcarlisle
Copy link
Collaborator

@fred-wang we should certainly align them but it's not clear either version is ideal.

If I decide to use + as a postfix operator

<mrow><mi>x</mi><mo form=postfix>+</mo></mrow>

The current dictionary only has prefix and infix:

         <operator-dictionary priority="400" form="infix" lspace="4" rspace="4"/>
         <operator-dictionary priority="720" form="prefix" lspace="0" rspace="0"/>
         <description unicode="1.1">PLUS SIGN</description>

so the full version seems to say to take the infix spacing and so

x​​​␣+␣

I think the expectation for "postfix, not in dictionary" would be "style like factorial" or equivalently "add no space" and give

x+

@NSoiffer ?

@NSoiffer
Copy link
Contributor

I agree that the two specs should not conflict in the algorithm. I don't remember why full specifies the order "infix, postfix, prefix". @fred-wang: is there a reason you changed that in core?

I looked at what is written for core for some details and it seems like there are some bugs in the core spec:

  1. "If Content is a single character in the range U+0320–U+03FF then exit with category Default." That ranges makes no sense to me. It covers part of the combining chars and also the Greek/Coptic chars. I think maybe it is trying to capture the combining chars, but the combining chars range is U+0300 - U+036F. There are additional combining chars 1AB0–1AFF and 1DC0–1DFF that maybe should be included.
  2. "If Form is infix and Content corresponds to one of U+007C VERTICAL LINE or U+223C TILDE OPERATOR then exit with category ForceDefault." That is the only mention of "ForceDefault", so it is undefined. Should it be "Default"?

Additionally, I think what follows the quoted text in '2' should start a new numbered item.

@fred-wang
Copy link
Contributor Author

@davidcarlisle: For the Core spec, fallback is only performed if the form was not explicitly specified so in that case it will just use the default spacing. In any case, I personally don't think we are going to solve all the issues with the automatic property calculation based on the operator dictionary... if people really want some specific spacing or properties, I expect they would use explicit attributes. Also, I won't be very happy if that ends up modifying the dictionary again after having updated Core/WPT/Chromium and now that I'm struggling to update Firefox's one. And I think the issue you raised is a bit tangential to the one I reported :-)

@NSoiffer: Regarding the present bug, I think it's a mistake from my side, I don't remember we discussed such a change. Regarding 1. this is just to discard characters that are not in the dictionary in order to leave room for remapping the Operators_2_ascii_chars. Regarding 2. this was explained in #104 ; but I forgot to add it to https://w3c.github.io/mathml-core/#operator-dictionary-categories-values ; will do it later.

I wrote a patch to check the problematic entries i.e. those having only two forms that are postix & prefix:

diff --git a/tables/operator-dictionary.py b/tables/operator-dictionary.py
index 03981b2..3691dfc 100755
--- a/tables/operator-dictionary.py
+++ b/tables/operator-dictionary.py
@@ -194,12 +194,19 @@ multipleCharTable = []
 otherEntries={}
 otherValuesCount={}
 otherValueTotalCount=0
+operatorForms = {}
 
 for entry in root:
     unicodeText = entry.get("unicode")
     characters = parseHexaSequence(unicodeText)
 
     form = entry.get("form")
+    if unicodeText not in operatorForms:
+        operatorForms[unicodeText] = {}
+    operatorForms[unicodeText][form] = "%s %s %s" % (entry.get("lspace"),
+                                                     entry.get("rspace"),
+                                                     entry.get("properties"))
+
     key = buildKey(characters, form)
     value = {"form": form}
     parseSpaces(value, entry, ["lspace", "rspace"])
@@ -765,3 +772,11 @@ cpp.write("\n};")
 cpp.close()
 
 print("done.")
+
+for unicodeText in operatorForms:
+    forms = operatorForms[unicodeText]
+    if (("infix" not in forms) and
+        ("prefix" in forms) and
+        ("postfix" in forms)):
+        for form in forms:
+            print("%s %s %s" % (unicodeText, form, forms[form]))

The result is:

U00021 prefix 0 0 None
U00021 postfix 0 0 None
U02016 prefix 0 0 fence stretchy symmetric
U02016 postfix 0 0 fence stretchy symmetric
U02980 prefix 0 0 fence stretchy symmetric
U02980 postfix 0 0 fence stretchy symmetric
U02999 prefix 0 0 symmetric fence stretchy
U02999 postfix 0 0 symmetric fence stretchy

So prefix & postfix forms have exactly the same values and the order actually does not matter...

Regarding browsers:

Firefox does infix, postfix, prefix: https://searchfox.org/mozilla-central/rev/b1e5f2c7c96be36974262551978d54f457db2cae/layout/mathml/nsMathMLOperators.cpp#327 (also it uses the fallback even if the form is explicit).

Chromium does infix, prefix, postfix: https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/mathml/mathml_operator_element.cc;l=182;drc=324b75d72ef57dbdbc6147be5bf851284cd1c2bb

WebKit claims to do "infix, prefix, postfix" but actually do in alphabetical order "infix, postfix, prefix": https://searchfox.org/wubkat/rev/2f9450c601c6090870dcfe81c870bc54d880ebda/Source/WebCore/mathml/MathMLOperatorDictionary.cpp#1140

@davidcarlisle
Copy link
Collaborator

And I think the issue you raised is a bit tangential to the one I reported :-)

Possibly. I was just fishing for clarification. Your opening issue just pointed to a difference between the specs but did not suggest a resolution. As you opened it here I'm assuming you see the change required in Core rather than Full?

if that ends up modifying the dictionary again

I wasn't suggesting modifying the dictionary entries.

So prefix & postfix forms have exactly the same values and the order actually does not matter...
yes, I noticed that while originally looking for an example for this issue.

@NSoiffer
Copy link
Contributor

Sounds like everyone agree the exact order isn't that important. The only important thing is that the specs and browsers all agree. Given that two of the three browsers do "infix, postfix, prefix", and that's what is in full, it seems like the easiest thing is to change the order in the core spec and the chromium implementation.

@fred-wang: do you agree that the core spec needs fixing wrt what I think are bugs mentioned in my comment above?

@fred-wang
Copy link
Contributor Author

My proposal is just to change the core spec to use "infix, postfix prefix" and update Chrome (with no behavior change, so no tests needed). As In said, I believe this change was a mistake from my side.

@NSoiffer No, the U+0320–U+03FF range and ForceDefault category are done on purpose. Removing them would break the algorithm. I committed a fix for the missing row: 3f7714d

@davidcarlisle
Copy link
Collaborator

that change is ok with me, as I commented I'm not convinced this fallback lookup is that useful but it's always been in mathml and as you say changing that isn't in scope at this time

@fred-wang
Copy link
Contributor Author

removing fallback lookup would probably break content with bad mrow grouping, which is a reason why these were implemented in browsers e.g. https://searchfox.org/wubkat/rev/2f9450c601c6090870dcfe81c870bc54d880ebda/Source/WebCore/mathml/MathMLOperatorDictionary.cpp#1141

@NSoiffer
Copy link
Contributor

Glad to see ForceDefault is now defined.

I still don't see why U+0320–U+03FF makes sense. Why are some combining chars included in the range and not others? Why is a Greek alpha treated different than a latin a? Although you don't need include text in the spec why this is so, it seems like a bug to me so you should explain why it isn't a bug.

fred-wang added a commit that referenced this issue Sep 27, 2022
operator" [1] try fallback forms in the order infix, postfix, prefix.
This is consistent with MathML 4 and older versions and implementations
in WebKit and Firefox. With the new version of the operator dictionary
the order prefix/postfix does not matter and this change is not
testable. For details, see [2].

[1] https://w3c.github.io/mathml-core/#algorithm-for-determining-the-properties-of-an-embellished-operator
[2] #167
@fred-wang
Copy link
Contributor Author

WebKit claims to do "infix, prefix, postfix" but actually do in alphabetical order "infix, postfix, prefix": https://searchfox.org/wubkat/rev/2f9450c601c6090870dcfe81c870bc54d880ebda/Source/WebCore/mathml/MathMLOperatorDictionary.cpp#1140

I stand corrected, it seems entries are properly sorted in the order infix, prefix, postfix too. Anyway, I guess it's not a big deal to change that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants