Skip to content

Try to integrate fork of Chili parallel rutime #140206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

zetanumbers
Copy link
Contributor

@zetanumbers zetanumbers commented Apr 23, 2025

Chili is a rust implementation of a parallel runtime with the heartbeat scheduling.Due to my bit of experience of working on rustc parallel runtime, I've tried to do a quick integration of chili into rustc to check the performance. My hacky modifications to Chili consist of:

  • Addition of TLV;
  • Deadlock detection with mark_(un)blocked methods;
  • Addition of another thread pool creation method scoped_with_config to mimic rayon's design;
    • As such spawning exact number of worker threads as specified, no longer considering parent thread to be a worker;
    • Add install method to run code on the worker threads.
  • Global ThreadPool was replaced with thread local context respecting subroutine calls.

I've also removed parallel! macro as I couldn't figure out how to do it without causing ambiguity error:

error: local ambiguity when calling macro `parallel`: multiple parsing options: built-in NTs expr ('blocks') or expr ('endblock').

Original runtime repo: https://github.com/dragostis/chili

Modifications to Chili to accommodate rustc: dragostis/chili@main...zetanumbers:chili:rustc

Related zulip topic: #t-compiler/parallel-rustc > use heartbeat scheduling to improve parallel frontend

@rustbot
Copy link
Collaborator

rustbot commented Apr 23, 2025

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 23, 2025
@rust-log-analyzer

This comment has been minimized.

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Apr 23, 2025

Currently this code fails from inside of Chili: dragostis/chili#29
Currently this code just hangs up

@bors
Copy link
Collaborator

bors commented Apr 23, 2025

☔ The latest upstream changes (presumably #139983) made this pull request unmergeable. Please resolve the merge conflicts.

@zetanumbers
Copy link
Contributor Author

Prototype integration is ready for benchmarks

@Zoxc
Copy link
Contributor

Zoxc commented Apr 25, 2025

7 threads:

BenchmarkBeforeAfterBeforeAfterBeforeAfter
TimeTime%Physical MemoryPhysical Memory%Committed MemoryCommitted Memory%
🟣 clap:check0.4688s0.6971s💔 48.70%200.59 MiB200.55 MiB -0.02%278.42 MiB276.75 MiB -0.60%
🟣 hyper:check0.1370s0.1852s💔 35.20%127.18 MiB126.30 MiB -0.69%200.79 MiB199.72 MiB -0.53%
🟣 regex:check0.2954s0.4575s💔 54.88%161.01 MiB159.40 MiB -1.00%229.10 MiB227.50 MiB -0.70%
🟣 syn:check0.5696s0.7758s💔 36.20%194.07 MiB192.92 MiB -0.59%264.84 MiB263.13 MiB -0.64%
Total1.4707s2.1155s💔 43.84%682.84 MiB679.17 MiB -0.54%973.16 MiB967.10 MiB -0.62%
Summary1.0000s1.4375s💔 43.75%1 byte0.99 bytes -0.57%1 byte0.99 bytes -0.62%
BenchmarkBeforeAfterBeforeAfterBeforeAfter
TimeTime%Physical MemoryPhysical Memory%Committed MemoryCommitted Memory%
🟣 clap:check:unchanged0.3761s0.3591s💚 -4.51%133.04 MiB132.72 MiB -0.24%234.70 MiB234.40 MiB -0.13%
🟣 hyper:check:unchanged0.1509s0.1418s💚 -5.99%85.57 MiB85.37 MiB -0.23%191.57 MiB191.32 MiB -0.13%
🟣 regex:check:unchanged0.2882s0.2698s💚 -6.39%112.93 MiB112.51 MiB -0.37%212.20 MiB211.58 MiB -0.29%
🟣 syn:check:unchanged0.5831s0.5616s💚 -3.69%157.73 MiB157.05 MiB -0.43%260.27 MiB259.25 MiB -0.39%
Total1.3982s1.3323s💚 -4.71%489.27 MiB487.65 MiB -0.33%898.74 MiB896.54 MiB -0.25%
Summary1.0000s0.9486s💚 -5.14%1 byte1.00 bytes -0.32%1 byte1.00 bytes -0.24%

2 threads:

BenchmarkBeforeAfterBeforeAfterBeforeAfter
TimeTime%Physical MemoryPhysical Memory%Committed MemoryCommitted Memory%
🟣 clap:check0.8644s1.0451s💔 20.90%193.95 MiB194.38 MiB 0.22%265.88 MiB264.76 MiB -0.42%
🟣 hyper:check0.1822s0.2146s💔 17.73%124.16 MiB124.16 MiB -0.00%197.65 MiB197.43 MiB -0.11%
🟣 regex:check0.5183s0.6032s💔 16.37%155.93 MiB155.86 MiB -0.05%223.65 MiB223.58 MiB -0.03%
🟣 syn:check0.8861s0.9679s💔 9.23%187.97 MiB188.41 MiB 0.23%258.04 MiB258.30 MiB 0.10%
Total2.4511s2.8307s💔 15.49%662.02 MiB662.81 MiB 0.12%945.22 MiB944.06 MiB -0.12%
Summary1.0000s1.1606s💔 16.06%1 byte1.00 bytes 0.10%1 byte1.00 bytes -0.12%
BenchmarkBeforeAfterBeforeAfterBeforeAfter
TimeTime%Physical MemoryPhysical Memory%Committed MemoryCommitted Memory%
🟣 clap:check:unchanged0.3342s0.3383s💔 1.22%132.49 MiB132.61 MiB 0.09%234.06 MiB234.17 MiB 0.05%
🟣 hyper:check:unchanged0.1370s0.1392s💔 1.58%84.94 MiB85.03 MiB 0.11%190.79 MiB190.84 MiB 0.03%
🟣 regex:check:unchanged0.2531s0.2542s 0.47%112.21 MiB111.99 MiB -0.20%211.32 MiB211.05 MiB -0.13%
🟣 syn:check:unchanged0.5317s0.5372s💔 1.03%156.78 MiB156.85 MiB 0.04%259.15 MiB258.92 MiB -0.09%
Total1.2559s1.2689s💔 1.03%486.42 MiB486.48 MiB 0.01%895.31 MiB894.98 MiB -0.04%
Summary1.0000s1.0108s💔 1.08%1 byte1.00 bytes 0.01%1 byte1.00 bytes -0.03%

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Apr 25, 2025

Hm, well I forgot to mention there's no jobserver integrated yet, so it's only relevant to benchmark a single rustc run, instead of a whole cargo check. Sorry for that. I would like to add this next.

Also there is a constant parameter on join_with_heartbeat_every we should keep in mind, which corresponds to how many join operations are actually parallel, unless worker's job queue was almost empty (len < 3) at that time. Simple join has it set to one out of 64 joins, which is bad for heavy code chunks.

And do not worry about doing benchmarks for me, unless you want to. I may not have time to collect benchmarks before day ends, so I just update the progress as it is. :)

And thank you for your measurements!

@Zoxc
Copy link
Contributor

Zoxc commented Apr 25, 2025

Hm, well I forgot to mention there's no jobserver integrated yet, so it's only relevant to benchmark a single rustc run, instead of a whole cargo check

The benchmarks are for a single rustc run. I'm using rcb (rcb bench --check --incr-none -n 10 --details none --rflag=a:-Zthreads=7 --threads=a master~~e master~chili~8).

The unchanged are interesting as the improvement doesn't seem to be reduced per-thread overhead, which would show up with 2 threads too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants