Skip to content

Smart and user-controllable automatic Constant memory space placement for static variables #218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LegNeato opened this issue May 26, 2025 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@LegNeato
Copy link
Contributor

#217 made codegen default to not using constant memory with an opt-in flag. But in the interest of performance we should try to use constant memory automatically as much as possible.

As it is, using the flag / turning on constant memory can blow up as constant memory placing logic isn't fully correct. Ideally we keep track of what we have put into constant memory and when it is filled up spill instead of only spilling when a static is too large on its own. We should also sometimes hard error rather than spilling...for example, if you annotate a bunch of things over the limit manually, the last annotation should fail at compile time.

We'll also probably want some packing strategy controlled by the user...for example, if you have one large static and many small ones, you might want the small ones to all be in constant memory or just the big one depending on your workload. We need some design work around this, and the design shouldn't require code to be annotated to support third party non-GPU-aware libraries.

Some designs off the top of my head:

  • Pack blindly by default, use the existing flag to turn automatic off and then rely on annotations and flags (?) to place. The flags could have a type name / path or an entire crate.
  • Different built-in packing and spilling strategies controlled by flags (largest first, smallest first, etc)
  • Use Externally Implementable Items rust-lang/rust-project-goals#254 to hand control to user code?
@LegNeato LegNeato added the enhancement New feature or request label May 26, 2025
@LegNeato LegNeato changed the title Smart and user-controllable automatic constant memory placement for static variables Smart and user-controllable automatic Constant memory space placement for static variables May 26, 2025
@vlovich
Copy link

vlovich commented May 27, 2025

I feel like the approach taken with respect to statics (at least the ideas outlined here & in https://rust-gpu.github.io/blog/2025/05/27/rust-cuda-update/) can be improved.

One challenge with the automated approach is that if I import a crate that does cuda_std::address_space(constant), suddenly I'm contending with a crate and have no option but to patch it and maintain a fork if I don't want that part to use constant space. Same goes with automatic spilling - if I don't use manual annotations and I import a crate, what ends up in the constant memory changes based on the heuristics and the performance of my code changes without me being able to do much about it other than tune heuristics & pray.

Have you considered having a separate config file / part of build.rs that can override the placement for any piece of data in any crate? By default everything could be placed in global and developers are responsible for specifying the pieces of data they want put into the constant space by hand at the outermost crate (or overriding the address space annotation within a crate)?

That way there's no surprising performance changes just by importing a crate and you have fine-grained control over what's in constants at a global level. Additionally, if you do go down the automatic heuristics space, I feel like profile-guided feedback is the only way to actually make it work reliably in a maintainable way for your users so that the layout is consistent across changes to the code and optimized for the data that is most frequently used on some representative load & that they can test out the impact of different things in the constant space.

Honestly I'd start with a usable system for manual management of it before focusing on the automated approach as people who are doing this kind of optimization probably feel more comfortable with manual control. The "make it work" default use-case is satisfied by using constant memory until it's exceeded & then just spilling everything to global. The chances that randomly selected data via packing performs well will only help random crates where your heuristics happen to line up.

@LegNeato LegNeato added help wanted Extra attention is needed good first issue Good for newcomers labels May 27, 2025
@LegNeato
Copy link
Contributor Author

LegNeato commented May 27, 2025

💯 ! That's what this issue is about 😄

Currently there are two modes:

  1. No automatic placement (the default, no surprises)
  2. Opt-in to (dumb) automatic placement (manually set, so still no surprises)

So the default is everything in global plus manual placement using attributes.

For most kernels, the dumb placement will work and possibly be a performance improvement. If you turn it on and it is a problem, easy to turn off.

The main issues come if you have it on, it works, and later you add statics or deps with statics and then it starts failing. That is definitely a problem! And that is what this issue intends to fix by giving the tools to do what you want regardless of your own code or deps. The next obvious step would be to do as you say, keep track across all statics and then spill. That is the first bullet in the blog post, and a bigger change. Then we want a way for the user control the placement for 3rd party code (we have the attributes for 1st party code).

Agreed that we'll need make sure we can override manual placement attributes in 3rd party crates, as they could use the attribute to place statics and potentially cause problems / override the desires of the kernel they are being used in. Overriding 3rd party choices could have interesting perf implications as the surrounding code may make assumptions that something is in Constant memory vs Global (but I would bet it would be super, super rare). This was punted because there are currently no crates that set this...pretty much all deps are GPU-unaware for now (we intend to change this within the year / once we are more stable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants