RFC: Use Rust's string-to-number conversions, instead of home-grown ones #156

federicomenaquintero · 2017-06-14T14:31:36Z

Rust's standard library contains pretty sophisticated code for parsing numbers from strings. This pull request modifies Tokenizer to that, instead of using a trivial "multiply by 10 and add the digit" algorithm.

However, this makes the tests fail!

For example, one of the tests parses a ".67%" token out of a bigger component-value-list, and the token has a spec like

	["percentage", "0.66999996", 0.67, "number"], " ",

and it fails with

        [
          "percentage",
result    "0.67",
result    0.6700000166893005,
expect    "0.66999996",
expect    0.6700000000000002,
          "number"
        ],

Some things to note:

The test code uses the rustc-serialize crate to read/write JSON, but that crate says it has been deprecated in favor of Serde.
rustc-serialize also uses a simple algorithm like rust-cssparser, where numbers are parsed by multiplying by 10 and adding each digit.
The test code uses rust-cssparser's serializer.rs to generate JSON out of the parsed tokens. It writes the string representations of numbers by doing a simple write!(writer, "{}", float). In the example above, this is what generates the "0.67" in the first result line, but that is different from the test fixture's expected "0.66999996". I am not sure which side to trust :)
Even though the result/expected numbers (not the stringified values) are different in the example above, the JSON code assumes f64, while Token uses f32. Also, almost_equals() in tests.rs compares the absolute value of the difference between result/expected to be < 1e-6 - I haven't dug deep enough to see why this comparison is failing for the example above.

I'm fine with this pull request not being merged, since the tests fail - but I'm not sure how those test values were generated in the first place?

Comments on the above are appreciated.

This change is

SimonSapin · 2017-06-14T14:46:47Z

How is the performance of std’s number parsing? This is performance sensitive since CSSOM is based on strings (e.g. JavaScript code like element.style.top = some_number + "px")

SimonSapin · 2017-06-14T14:48:08Z

CC @bzbarsky

bzbarsky · 2017-06-14T17:25:11Z

I, too, would want to see some performance numbers.

I would also like to know what the rounding behavior is. That is, it sounds like the std algorithm is producing different numbers from the "multiply and add" thing. Why is that? What are the actual underlying f32 values produced from the "0.67" input by the two algorithms?

Note also that serialization of floats in Rust via the {} formatting stuff is unfortunately kinda broken: it tries to output more digits of precision than floats actually have, so you end up with weirdness. See also servo/servo#17205. But in this case, since we were using {} both before and after it sounds like the actual f32 did in fact change.

federicomenaquintero · 2017-06-14T20:08:36Z

I've started https://github.com/federicomenaquintero/float-tests to see what the various parsers generate. The Cargo.toml has URLs for the stock rust-cssparser and my version.

I'm printing ToCss::to_css_string()'s result out of a parsed token, and so far both versions produce the same results. I'm not sure yet why rust-cssparser's tests work in the stock version, nor why the differences happen in that string for 0.67.

I'll look into writing a benchmark for both versions.

federicomenaquintero · 2017-06-15T00:42:25Z

OK, found it.

Summary: the values in the expected JSON fixture have been divided and multiplied by 100, for Percentage values, and this causes loss of precision from the original parsed values.

I put some println!()s right where the float value is built, and then when it is serialized to JSON.

consume_numeric() does this:

        return Percentage(PercentageValue {
            unit_value: (value / 100.) as f32,
            ...
        };

And later, impl ToCss for Token's to_css() does this:

            Token::Percentage(PercentageValue { unit_value, int_value, has_sign }) => {
                let value = NumericValue {
                    value: unit_value * 100.,
                    ...
                };
                try!(write_numeric(value, dest));

where write_numeric() simply does write! (writer, "{}", value).

The original multiply-by-ten code outputs this:

percentage: '.67%' 0.6700000000000002
serializing percentage: 0.66999996

(That's the string slice, the f32 value, and then the value divided/multiplied by 100.)

My version with f32::from_str():

percentage: '.67%' 0.67
serializing percentage: 0.67

The test program from my [float-tests] prints this at the end, when used with the stock rust-cssparser:

Token.to_css_string() = "0.66999996%"
Token::Percentage(PercentageValue.value) - value: 0.0067, bits: 111011110110111000101110101100 (0x3bdb8bac)

And when run with my patched version:

Token.to_css_string() = "0.67%"
Token::Percentage(PercentageValue.value) - value: 0.0067000003, bits: 111011110110111000101110101101 (0x3bdb8bad)

The f32's differ by a single bit, so maybe the *100 /100 roundtrip is not too bad :)

federicomenaquintero · 2017-06-15T00:46:14Z

I'll get some benchmarks. I don't think the loss of precision is worth considering; it's way deep in the fractional part.

bzbarsky · 2017-06-15T02:20:18Z

There are two ways the loss of precision could in theory manifest itself:

Changes in the serialization. If we really are dealing with a single ulp, then that shouldn't affect serialization once we start doing serialization right (to only 6 sig figs).
Changes to web behavior. This is primarily a problem with SVG where people tend to specify small things and then multiply them by large scales. I suspect a single ulp difference would again not be noticeable here.

bors-servo · 2017-06-16T11:44:56Z

☔ The latest upstream changes (presumably #159) made this pull request unmergeable. Please resolve the merge conflicts.

federicomenaquintero · 2019-05-03T01:35:13Z

Discarding this for now; I haven't had time to investigate it further.

Fix typo in documentation

58d733a

federicomenaquintero force-pushed the master branch from 7ccbd97 to 58d733a Compare May 3, 2019 01:33

federicomenaquintero closed this May 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Use Rust's string-to-number conversions, instead of home-grown ones #156

RFC: Use Rust's string-to-number conversions, instead of home-grown ones #156

Uh oh!

federicomenaquintero commented Jun 14, 2017 •

edited

Loading

Uh oh!

SimonSapin commented Jun 14, 2017

Uh oh!

SimonSapin commented Jun 14, 2017

Uh oh!

bzbarsky commented Jun 14, 2017

Uh oh!

federicomenaquintero commented Jun 14, 2017

Uh oh!

federicomenaquintero commented Jun 15, 2017

Uh oh!

federicomenaquintero commented Jun 15, 2017

Uh oh!

bzbarsky commented Jun 15, 2017

Uh oh!

bors-servo commented Jun 16, 2017

Uh oh!

federicomenaquintero commented May 3, 2019

Uh oh!

Uh oh!

RFC: Use Rust's string-to-number conversions, instead of home-grown ones #156

RFC: Use Rust's string-to-number conversions, instead of home-grown ones #156

Uh oh!

Conversation

federicomenaquintero commented Jun 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SimonSapin commented Jun 14, 2017

Uh oh!

SimonSapin commented Jun 14, 2017

Uh oh!

bzbarsky commented Jun 14, 2017

Uh oh!

federicomenaquintero commented Jun 14, 2017

Uh oh!

federicomenaquintero commented Jun 15, 2017

Uh oh!

federicomenaquintero commented Jun 15, 2017

Uh oh!

bzbarsky commented Jun 15, 2017

Uh oh!

bors-servo commented Jun 16, 2017

Uh oh!

federicomenaquintero commented May 3, 2019

Uh oh!

Uh oh!

federicomenaquintero commented Jun 14, 2017 •

edited

Loading