Skip to content

RFC: Use Rust's string-to-number conversions, instead of home-grown ones #156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

federicomenaquintero
Copy link
Contributor

@federicomenaquintero federicomenaquintero commented Jun 14, 2017

Rust's standard library contains pretty sophisticated code for parsing numbers from strings. This pull request modifies Tokenizer to that, instead of using a trivial "multiply by 10 and add the digit" algorithm.

However, this makes the tests fail!

For example, one of the tests parses a ".67%" token out of a bigger component-value-list, and the token has a spec like

	["percentage", "0.66999996", 0.67, "number"], " ",

and it fails with

        [
          "percentage",
result    "0.67",
result    0.6700000166893005,
expect    "0.66999996",
expect    0.6700000000000002,
          "number"
        ],

Some things to note:

  • The test code uses the rustc-serialize crate to read/write JSON, but that crate says it has been deprecated in favor of Serde.
  • rustc-serialize also uses a simple algorithm like rust-cssparser, where numbers are parsed by multiplying by 10 and adding each digit.
  • The test code uses rust-cssparser's serializer.rs to generate JSON out of the parsed tokens. It writes the string representations of numbers by doing a simple write!(writer, "{}", float). In the example above, this is what generates the "0.67" in the first result line, but that is different from the test fixture's expected "0.66999996". I am not sure which side to trust :)
  • Even though the result/expected numbers (not the stringified values) are different in the example above, the JSON code assumes f64, while Token uses f32. Also, almost_equals() in tests.rs compares the absolute value of the difference between result/expected to be < 1e-6 - I haven't dug deep enough to see why this comparison is failing for the example above.

I'm fine with this pull request not being merged, since the tests fail - but I'm not sure how those test values were generated in the first place?

Comments on the above are appreciated.


This change is Reviewable

@SimonSapin
Copy link
Member

How is the performance of std’s number parsing? This is performance sensitive since CSSOM is based on strings (e.g. JavaScript code like element.style.top = some_number + "px")

@SimonSapin
Copy link
Member

CC @bzbarsky

@bzbarsky
Copy link
Contributor

I, too, would want to see some performance numbers.

I would also like to know what the rounding behavior is. That is, it sounds like the std algorithm is producing different numbers from the "multiply and add" thing. Why is that? What are the actual underlying f32 values produced from the "0.67" input by the two algorithms?

Note also that serialization of floats in Rust via the {} formatting stuff is unfortunately kinda broken: it tries to output more digits of precision than floats actually have, so you end up with weirdness. See also servo/servo#17205. But in this case, since we were using {} both before and after it sounds like the actual f32 did in fact change.

@federicomenaquintero
Copy link
Contributor Author

I've started https://github.com/federicomenaquintero/float-tests to see what the various parsers generate. The Cargo.toml has URLs for the stock rust-cssparser and my version.

I'm printing ToCss::to_css_string()'s result out of a parsed token, and so far both versions produce the same results. I'm not sure yet why rust-cssparser's tests work in the stock version, nor why the differences happen in that string for 0.67.

I'll look into writing a benchmark for both versions.

@federicomenaquintero
Copy link
Contributor Author

OK, found it.

Summary: the values in the expected JSON fixture have been divided and multiplied by 100, for Percentage values, and this causes loss of precision from the original parsed values.

I put some println!()s right where the float value is built, and then when it is serialized to JSON.

consume_numeric() does this:

        return Percentage(PercentageValue {
            unit_value: (value / 100.) as f32,
            ...
        };

And later, impl ToCss for Token's to_css() does this:

            Token::Percentage(PercentageValue { unit_value, int_value, has_sign }) => {
                let value = NumericValue {
                    value: unit_value * 100.,
                    ...
                };
                try!(write_numeric(value, dest));

where write_numeric() simply does write! (writer, "{}", value).

The original multiply-by-ten code outputs this:

percentage: '.67%' 0.6700000000000002
serializing percentage: 0.66999996

(That's the string slice, the f32 value, and then the value divided/multiplied by 100.)

My version with f32::from_str():

percentage: '.67%' 0.67
serializing percentage: 0.67

The test program from my [float-tests] prints this at the end, when used with the stock rust-cssparser:

Token.to_css_string() = "0.66999996%"
Token::Percentage(PercentageValue.value) - value: 0.0067, bits: 111011110110111000101110101100 (0x3bdb8bac)

And when run with my patched version:

Token.to_css_string() = "0.67%"
Token::Percentage(PercentageValue.value) - value: 0.0067000003, bits: 111011110110111000101110101101 (0x3bdb8bad)

The f32's differ by a single bit, so maybe the *100 /100 roundtrip is not too bad :)

@federicomenaquintero
Copy link
Contributor Author

I'll get some benchmarks. I don't think the loss of precision is worth considering; it's way deep in the fractional part.

@bzbarsky
Copy link
Contributor

There are two ways the loss of precision could in theory manifest itself:

  1. Changes in the serialization. If we really are dealing with a single ulp, then that shouldn't affect serialization once we start doing serialization right (to only 6 sig figs).

  2. Changes to web behavior. This is primarily a problem with SVG where people tend to specify small things and then multiply them by large scales. I suspect a single ulp difference would again not be noticeable here.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #159) made this pull request unmergeable. Please resolve the merge conflicts.

@federicomenaquintero
Copy link
Contributor Author

Discarding this for now; I haven't had time to investigate it further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants