-
-
Notifications
You must be signed in to change notification settings - Fork 340
Implement multithreading for snapshot generation #7987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement multithreading for snapshot generation #7987
Conversation
joshuawarner32
commented
Jul 9, 2025
- Extract commonality in snapshot tests between single-file and multi-file snapshots
- Implement multithreading for snapshot generation
e5a4a0c
to
0590de8
Compare
src/base/parallel.zig
Outdated
const Thread = std.Thread; | ||
|
||
/// Maximum number of threads that can be spawned | ||
const MAX_THREADS = 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Systems can have more cores then this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify the setup and have a stack-allocated array of reasonable size. But fair point, it's not hard to fix up.
src/base/parallel.zig
Outdated
max_threads: usize = 0, | ||
|
||
/// Force single-threaded execution | ||
single_threaded: bool = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use max_threads = 1
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind there's a distinction between "use one [other] thread", and "use this thread". Usually not important, but for example being on the main thread can be important for certain things on macos. Also somewhat simplifies interaction with a debugger. But yeah, max_threads=1 makes sense and is a smaller api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I was thing that max_threads=1 would just skip spawning the threads.
src/base/parallel.zig
Outdated
return struct { | ||
allocator: Allocator, | ||
work_items: []const T, | ||
index: *std.atomic.Value(usize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make an AtomicUsize
alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of putting this in base necessarily. Different concurrent work would want to be mapped differently. Though maybe this is the dfault type for large compiler work and fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a chunk size setting could also be added later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different concurrent work would want to be mapped differently.
100% agree. I'm not intending this to be a be-all-end-all; just serve this particular simple need at the moment.
src/base/parallel.zig
Outdated
/// Internal worker thread context | ||
fn WorkerContext(comptime T: type) type { | ||
return struct { | ||
allocator: Allocator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a guarantee that all allocators are thread safe. Though I guess all the ones we use are...so 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why Rust is nice 😭
src/snapshot.zig
Outdated
@@ -115,6 +117,8 @@ pub fn main() !void { | |||
generate_html = true; | |||
} else if (std.mem.eql(u8, arg, "--debug")) { | |||
debug_mode = true; | |||
} else if (std.mem.eql(u8, arg, "--singlethread")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: --single-threaded
?
Also probably add an arg to set the number of threads. Like I often want one core free so my cpu doesn't hang up from parallel workers like this.
const duration_ms = timer.read() / std.time.ns_per_ms; | ||
|
||
std.log.info("processed {d} snapshots in {d} ms.", .{ total_success, duration_ms }); | ||
std.log.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ditto with the collected part of this log. I think it should be a debug log only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was very useful in that it made it clearly obvious to me that it was not worth trying to optimize the collection step. I'm tempted to leave it in in some respect, but open to rewording / simplifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is snapshot.zig
...so it doesn't really matter. Just feels like info that would be behind a --time
flag to get extra detail.
// Parse the content | ||
var ast = parse.parse(&module_env, roc_content); | ||
defer ast.deinit(allocator); | ||
// Parse the source code based on node type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes around here don't feel like they belong in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to break this out into a separate PR (it's already a separate commit). This is just unifying a couple pieces of redundant code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. Sounds fine. I just wanted to make sure we didn't pull in accidental changes.
src/snapshot.zig
Outdated
.max_threads = 0, // Auto-detect | ||
}; | ||
|
||
const result = try parallel.processParallel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we rename the function to just proccess. so it is called as parallel.process
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for any other duplicate names like that.
I think some form of this is likely worth merging especially for high level dependency free stuff like format and snapshots. |
704d16c
to
50ec5cd
Compare
21efa1e
to
697a712
Compare
Anyway, all my comments are definitely minor. Clean up as you please and feel free to just merge. |