Skip to content

Conversation

@Menkib64
Copy link
Contributor

This is a major refactoring to the search implementation. The main focus is to reduce cost of scheduling task. There are many threading and memory management changes towards the goal.

I expect that there will be a few minor conflicts with #2312.

There is need to allocate rapidly memory which will be automatically
freed when a new search iteration starts. This adds a custom allocator
to manage memory.
Refactoring GatherMinibatch aims to improve the scaling to more threads.
Task scheduling becomes faster. It allows task number scale up
dynamically. These changes are aimed to be elo neutral when GPU is the
bottleneck. CPU bottleneck gains a minor improvement. The plan is to
base future changes to improve CPU performance.
This fixes a performance issue that I noticed when testing tinygyal vs
master tinygyal match.
@Menkib64 Menkib64 force-pushed the dag_scaling_improvements_v3 branch from 3ce9244 to 3898767 Compare October 15, 2025 18:50
Kernel might decide to suspend a thread when scheduling new tasks. This
could result into a case where another writer could manage to write a
newer task to the same bucket. Compare exchange avoids overwriting the
newer pointer before the competiting reader has a chance to read it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant