Skip to content

[html] Various performance optimizations #2019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 24, 2025
Merged

Conversation

moffatman
Copy link
Contributor

Should be twice as fast according to my use cases. Happy to discuss what you need for testing, (micro)benchmarking, preserving public API for old slow functions, or whatever else to merge it. I know I still need to do stuff like bump version, changelog, etc.

This was originally at dart-archive/html#251

That version has proper separation of each commit which may be more helpful, it's been squashed for the repo migration.


  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.

Copy link

github-actions bot commented Feb 15, 2025

PR Health

Breaking changes ⚠️
Package Change Current Version New Version Needed Version Looking good?
html Breaking 0.15.5+1 0.15.6 0.16.0
Got "0.15.6" expected >= "0.16.0" (breaking changes)
⚠️

This check can be disabled by tagging the PR with skip-breaking-check.

Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

Coverage ⚠️
File Coverage
pkgs/html/lib/parser.dart 💔 99 % ⬇️ 0 %
pkgs/html/lib/src/constants.dart 💚 93 % ⬆️ 0 %
pkgs/html/lib/src/html_input_stream.dart 💚 80 % ⬆️ 11 %
pkgs/html/lib/src/tokenizer.dart 💚 100 %
pkgs/html/lib/src/treebuilder.dart 💚 95 %
pkgs/html/lib/src/trie.dart 💔 Not covered
pkgs/html/tool/generate_trie.dart 💔 Not covered

This check for test coverage is informational (issues shown here will not fail the PR).

This check can be disabled by tagging the PR with skip-coverage-check.

API leaks ⚠️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbols
html HtmlTokenizer
Token
HtmlInputStream
TagToken
DoctypeToken
StringToken
TreeBuilder
ActiveFormattingElements
StartTagToken
TagAttribute
CommentToken
CharactersToken
SpaceCharactersToken
EndTagToken

This check can be disabled by tagging the PR with skip-leaking-check.

License Headers ⚠️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
pkgs/html/lib/parser.dart
pkgs/html/lib/src/constants.dart
pkgs/html/lib/src/html_input_stream.dart
pkgs/html/lib/src/tokenizer.dart
pkgs/html/lib/src/treebuilder.dart
pkgs/html/test/parser_feature_test.dart
pkgs/html/test/trie_test.dart
pkgs/html/tool/generate_trie.dart

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/bazel_worker/benchmark/benchmark.dart
pkgs/bazel_worker/example/client.dart
pkgs/bazel_worker/example/worker.dart
pkgs/benchmark_harness/integration_test/perf_benchmark_test.dart
pkgs/boolean_selector/example/example.dart
pkgs/clock/lib/clock.dart
pkgs/clock/lib/src/clock.dart
pkgs/clock/lib/src/default.dart
pkgs/clock/lib/src/stopwatch.dart
pkgs/clock/lib/src/utils.dart
pkgs/clock/test/clock_test.dart
pkgs/clock/test/default_test.dart
pkgs/clock/test/stopwatch_test.dart
pkgs/clock/test/utils.dart
pkgs/coverage/lib/src/coverage_options.dart
pkgs/coverage/test/collect_coverage_config_test.dart
pkgs/coverage/test/config_file_locator_test.dart
pkgs/html/example/main.dart
pkgs/html/lib/dom.dart
pkgs/html/lib/dom_parsing.dart
pkgs/html/lib/html_escape.dart
pkgs/html/lib/src/encoding_parser.dart
pkgs/html/lib/src/list_proxy.dart
pkgs/html/lib/src/query_selector.dart
pkgs/html/lib/src/token.dart
pkgs/html/lib/src/utils.dart
pkgs/html/test/dom_test.dart
pkgs/html/test/parser_test.dart
pkgs/html/test/query_selector_test.dart
pkgs/html/test/selectors/level1_baseline_test.dart
pkgs/html/test/selectors/level1_lib.dart
pkgs/html/test/selectors/selectors.dart
pkgs/html/test/support.dart
pkgs/html/test/tokenizer_test.dart
pkgs/pubspec_parse/test/git_uri_test.dart
pkgs/stack_trace/example/example.dart
pkgs/watcher/test/custom_watcher_factory_test.dart
pkgs/yaml_edit/example/example.dart

This check can be disabled by tagging the PR with skip-license-check.

@mosuem mosuem requested a review from HosseinYousefi April 17, 2025 12:41
Copy link
Member

@HosseinYousefi HosseinYousefi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! A 2x Improvement sounds great. I had some minor comments and you'll need to up the version to 0.15.6 and add a changelog and it's good to go!

@@ -0,0 +1,18 @@
import 'package:dart_style/dart_style.dart';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should live in html/tool/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'const entitiesTrieRoot = $root;'.replaceAll('{}', '<int, dynamic>{}');
final formatted = DartFormatter().format(source);
// trimRight() because print adds its own newline
print(formatted.trimRight());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of printing to stdout, this could override lib/src/trie.dart

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}
final source =
'const entitiesTrieRoot = $root;'.replaceAll('{}', '<int, dynamic>{}');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a // AUTO GENERATED by 'tool/generate_trie.dart'. DO NOT EDIT! to the beginning of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (final entity in entities.keys) {
var node = root;
for (final charCode in entity.codeUnits) {
node = (node[charCode] ??= <int, dynamic>{}) as Map<int, dynamic>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good enough approach. Another way would be to create a 2d array of STATE * ALPHABET which would be around 500KBs of memory but faster to traverse. Best is using double array tries but that's more complicated to implement. A potential fun project for later perhaps since a 2x improvement is great already!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never heard of those, interesting

@@ -424,7 +468,60 @@ const List<String> tableInsertModeElements = [
];

// TODO(jmesserly): remove these in favor of the test functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the TODO comment mentions, a possible improvement here is to have a test function that only checks 2 integer bounds instead of using Set.contains. Could be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it doesn't quite contain all letters right? Since there are some symbols between upper and lowercase letters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then 4 integer bound checks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -77,9 +77,6 @@ On line 4, column 3 of ParseError: Unexpected DOCTYPE. Ignored.
expect(parser.errors.length, 1);
final error = parser.errors[0];
expect(error.errorCode, 'unexpected-doctype');
expect(error.span!.start.line, 3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand why we deleted these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I no longer generate fileInfo unless generateSpans=true, we don't get any error.span any more for this test which has generateSpans=false

HosseinYousefi

This comment was marked as duplicate.

@HosseinYousefi
Copy link
Member

HosseinYousefi commented Apr 17, 2025

Somehow my comments were published twice, so I marked one as duplicated, and now there is only one! So please expand the comments under the first section that is marked as duplicated! I don't know how I can unmark it... Yay! Found an "unhide" option!

'const entitiesTrieRoot = $root;'.replaceAll('{}', '<int, dynamic>{}');
final formatted = DartFormatter().format(source);
final htmlDir = File(Platform.script.path).parent.parent;
File('${htmlDir.path}/lib/src/trie.dart').writeAsStringSync(formatted);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work on windows? You can use join from package:path to make it platform independent:

  File(p.join(htmlDir.path, 'lib', 'src', 'trie.dart').writeAsStringSync(formatted);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it seems that this file is not formatted using dart format causing the CI to complain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

github-actions bot commented Apr 22, 2025

Package publishing

Package Version Status Publish tag (post-merge)
package:bazel_worker 1.1.3 already published at pub.dev
package:benchmark_harness 2.3.1 already published at pub.dev
package:boolean_selector 2.1.2 already published at pub.dev
package:browser_launcher 1.1.3 already published at pub.dev
package:cli_config 0.2.1-wip WIP (no publish necessary)
package:cli_util 0.4.2 already published at pub.dev
package:clock 1.1.2 already published at pub.dev
package:code_builder 4.10.2-wip WIP (no publish necessary)
package:coverage 1.12.0 already published at pub.dev
package:csslib 1.0.2 already published at pub.dev
package:extension_discovery 2.1.0 already published at pub.dev
package:file 7.0.2-wip WIP (no publish necessary)
package:file_testing 3.1.0-wip WIP (no publish necessary)
package:glob 2.1.3 already published at pub.dev
package:graphs 2.3.3-wip WIP (no publish necessary)
package:html 0.15.6 ready to publish html-v0.15.6
package:io 1.1.0-wip WIP (no publish necessary)
package:json_rpc_2 4.0.0 ready to publish json_rpc_2-v4.0.0
package:markdown 7.3.1-wip WIP (no publish necessary)
package:mime 2.0.0 already published at pub.dev
package:oauth2 2.0.4-wip WIP (no publish necessary)
package:package_config 2.3.0-wip WIP (no publish necessary)
package:pool 1.5.2-wip WIP (no publish necessary)
package:pub_semver 2.2.0 already published at pub.dev
package:pubspec_parse 1.5.0 already published at pub.dev
package:source_map_stack_trace 2.1.3-wip WIP (no publish necessary)
package:source_maps 0.10.14-wip WIP (no publish necessary)
package:source_span 1.10.1 already published at pub.dev
package:sse 4.1.8 ready to publish sse-v4.1.8
package:stack_trace 1.12.1 already published at pub.dev
package:stream_channel 2.1.4 already published at pub.dev
package:stream_transform 2.1.2-wip WIP (no publish necessary)
package:string_scanner 1.4.1 already published at pub.dev
package:term_glyph 1.2.3-wip WIP (no publish necessary)
package:test_reflective_loader 0.2.3 already published at pub.dev
package:timing 1.0.2 already published at pub.dev
package:unified_analytics 8.0.1 already published at pub.dev
package:watcher 1.1.1 already published at pub.dev
package:yaml 3.1.3 already published at pub.dev
package:yaml_edit 2.2.2 already published at pub.dev

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

- Format generate_trie.dart
- Use path join() in generate_trie.dart
- Add charsUntilAsciiLetter()
- Move all codes to Charcode {}
This is a noticeable speedup by scanning and avoiding creating a new string first.
@moffatman
Copy link
Contributor Author

I found another 40% speedup via some new commits

String toAsciiLowerCase() =>
String.fromCharCodes(codeUnits.map(_asciiToLower));
String toAsciiLowerCase() {
if (codeUnits.any((c) => c >= Charcode.kUpperA && c <= Charcode.kUpperZ)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These conditions can live as functions: codeUnits.any(isUpperCaseCode)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

static const int kGreaterThan = 0x3E;

/// A
static const int kUpperA = 0x41;
Copy link
Member

@HosseinYousefi HosseinYousefi Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We usually don't use kFooBar when identifying constants in Dart, instead we just use fooBar.
https://dart.dev/effective-dart/style#dont-use-prefix-letters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

: String.fromCharCode(_chars[_offset]);
final charCode = _chars[_offset];
if (charCode < 256) {
return asciiCharacters[charCode];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing this to String.fromCharCode(charCode) affect the performance much?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvement is 1% in AoT compilation and 6% in JIT

@@ -107,8 +107,9 @@ class HtmlInputStream {
}
}

final isSurrogatePair =
i + 1 < charsLength && _isLeadSurrogate(c) && _isTrailSurrogate(rawChars[i + 1]);
final isSurrogatePair = _isLeadSurrogate(c) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also order it in the original way but when you think about the fact that most characters aren't surrogate, this makes sense as it saves a check per character! Good find!

@HosseinYousefi HosseinYousefi self-requested a review April 24, 2025 07:41
Copy link
Member

@HosseinYousefi HosseinYousefi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your contribution!

@HosseinYousefi HosseinYousefi merged commit 11f4cf7 into dart-lang:main Apr 24, 2025
16 checks passed
copybara-service bot pushed a commit to dart-lang/sdk that referenced this pull request Apr 28, 2025
Revisions updated by `dart tools/rev_sdk_deps.dart`.

dartdoc (https://github.com/dart-lang/dartdoc/compare/1788710..1a08646):
  1a08646a  2025-04-28  Sam Rawlins  Stop using InterfaceElementImpl2 (dart-lang/dartdoc#4041)
  38b7d2e0  2025-04-28  Sam Rawlins  Stop using ConstructorElementImpl (dart-lang/dartdoc#4040)
  ad65ae89  2025-04-25  Sarah Zakarias  Make redirect pages for categories (dart-lang/dartdoc#4037)
  9ab9ba05  2025-04-23  Parker Lougheed  Migrate remaining styles to use sass nesting (dart-lang/dartdoc#4024)
  aaea6377  2025-04-22  Jonas Finnemann Jensen  Use _category name_ in the URL, instead of the _category displayName_. (dart-lang/dartdoc#4031)
  0388c64e  2025-04-22  Jonas Finnemann Jensen  Do not make a copy of PUB_CACHE, just create an empty one instead. (dart-lang/dartdoc#4032)

tools (https://github.com/dart-lang/tools/compare/11a7719..11f4cf7):
  11f4cf7e  2025-04-24  Callum Moffat  [html] Various performance optimizations (dart-lang/tools#2019)
  98d4e4d9  2025-04-22  Simone Stasi  [html] fix TypeError in nth-child query selector (dart-lang/tools#2015)

Change-Id: Ic5d0a5f7f9393fa1a585b80ec1e014647224f151
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/425200
Auto-Submit: Devon Carew <[email protected]>
Reviewed-by: Konstantin Shcheglov <[email protected]>
Commit-Queue: Konstantin Shcheglov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants