Skip to content

Implement multithreading for snapshot generation #7987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

joshuawarner32
Copy link
Collaborator

  • Extract commonality in snapshot tests between single-file and multi-file snapshots
  • Implement multithreading for snapshot generation

@joshuawarner32 joshuawarner32 force-pushed the multithread-snapshots branch from e5a4a0c to 0590de8 Compare July 9, 2025 23:37
bhansconnect
bhansconnect previously approved these changes Jul 10, 2025
const Thread = std.Thread;

/// Maximum number of threads that can be spawned
const MAX_THREADS = 128;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Systems can have more cores then this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the setup and have a stack-allocated array of reasonable size. But fair point, it's not hard to fix up.

max_threads: usize = 0,

/// Force single-threaded execution
single_threaded: bool = false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use max_threads = 1 for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind there's a distinction between "use one [other] thread", and "use this thread". Usually not important, but for example being on the main thread can be important for certain things on macos. Also somewhat simplifies interaction with a debugger. But yeah, max_threads=1 makes sense and is a smaller api.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I was thing that max_threads=1 would just skip spawning the threads.

return struct {
allocator: Allocator,
work_items: []const T,
index: *std.atomic.Value(usize),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make an AtomicUsize alias?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of putting this in base necessarily. Different concurrent work would want to be mapped differently. Though maybe this is the dfault type for large compiler work and fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a chunk size setting could also be added later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different concurrent work would want to be mapped differently.

100% agree. I'm not intending this to be a be-all-end-all; just serve this particular simple need at the moment.

/// Internal worker thread context
fn WorkerContext(comptime T: type) type {
return struct {
allocator: Allocator,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a guarantee that all allocators are thread safe. Though I guess all the ones we use are...so 🤷

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why Rust is nice 😭

src/snapshot.zig Outdated
@@ -115,6 +117,8 @@ pub fn main() !void {
generate_html = true;
} else if (std.mem.eql(u8, arg, "--debug")) {
debug_mode = true;
} else if (std.mem.eql(u8, arg, "--singlethread")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: --single-threaded?

Also probably add an arg to set the number of threads. Like I often want one core free so my cpu doesn't hang up from parallel workers like this.

const duration_ms = timer.read() / std.time.ns_per_ms;

std.log.info("processed {d} snapshots in {d} ms.", .{ total_success, duration_ms });
std.log.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ditto with the collected part of this log. I think it should be a debug log only.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was very useful in that it made it clearly obvious to me that it was not worth trying to optimize the collection step. I'm tempted to leave it in in some respect, but open to rewording / simplifying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is snapshot.zig...so it doesn't really matter. Just feels like info that would be behind a --time flag to get extra detail.

// Parse the content
var ast = parse.parse(&module_env, roc_content);
defer ast.deinit(allocator);
// Parse the source code based on node type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes around here don't feel like they belong in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to break this out into a separate PR (it's already a separate commit). This is just unifying a couple pieces of redundant code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Sounds fine. I just wanted to make sure we didn't pull in accidental changes.

src/snapshot.zig Outdated
.max_threads = 0, // Auto-detect
};

const result = try parallel.processParallel(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename the function to just proccess. so it is called as parallel.process?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for any other duplicate names like that.

@bhansconnect
Copy link
Member

I think some form of this is likely worth merging especially for high level dependency free stuff like format and snapshots.

@joshuawarner32 joshuawarner32 force-pushed the multithread-snapshots branch 3 times, most recently from 704d16c to 50ec5cd Compare July 10, 2025 19:09
@joshuawarner32 joshuawarner32 force-pushed the multithread-snapshots branch from 21efa1e to 697a712 Compare July 10, 2025 22:55
@bhansconnect
Copy link
Member

Anyway, all my comments are definitely minor. Clean up as you please and feel free to just merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants