Skip to content

Single source for both CPU and GPU code possible? #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gzz2000 opened this issue Mar 1, 2022 · 7 comments
Open

Single source for both CPU and GPU code possible? #49

gzz2000 opened this issue Mar 1, 2022 · 7 comments

Comments

@gzz2000
Copy link

gzz2000 commented Mar 1, 2022

Hello. Thanks for this awesome project. I can now compile CUDA kernels in rust into ptx in one cargo package, and use them in another package. Now I wonder whether it is possible to write both the kernels and the CPU code within one package, or even one rust source file. For example, it might look like this:

// shared code between cpu and gpu. struct definitions may also be shared
// similar to CUDA's __device__ __host__
#[devicehost like thing...]
pub fn adder_both_cpu_gpu(a: f32, b: f32) -> f32 {
    a + b
}

#[kernel]
pub unsafe fn add(a: &[f32], b: &[f32], c: *mut f32) {
    let idx = thread::index_1d() as usize;
    if idx < a.len() {
        let elem = &mut *c.add(idx);
        *elem = adder_both_cpu_gpu(a[idx], b[idx]);
    }
}

fn main() {
    // use kernel fn add
}

Unfortunately, this seems not possible at this moment. However, I think it is purely a matter of some convenient macros. For example, if we define the kernel macro to, instead of directly mark the function as kernel, launch a separate cargo build with cuda_builder, and replace the CPU code with a lazy_static ptx module import. This way it would be easier to manage dependency and reuse between CPU code and GPU code and save some boilerplates.
I don't know if you are interested.. Though it might be harder than I think:(

@rcarson3
Copy link

rcarson3 commented Mar 6, 2022

I'd actually asked about this on the science-and-ai rust discord channel in regards to a no-std crate. I'm just rehashing some of this here but @RDambrosio016 seemed to suggest it might be possible if you made use of #[cfg(target...)] for certain things that were specific to cuda kernel calling and made use of per-target dependencies. Although, it seemed like there might be issues in-regards to the build.rs file deadlocking itself due to a cyclic-dependency. Although, one of the oxide-enzyme team members seemed to suggest that if you checked whether or not you were on your first or second invocation of the build.rs you could get around that issue. They did this by creating a file during the first invocation and using that as a check.

Although, I haven't had the time to check to see if what was proposed there would work as I've still been working on the designs of the library.

@beepster4096
Copy link
Contributor

Couldn't you also check the CARGO_CFG_TARGET_ARCH var in the build script to avoid the deadlock?

@RDambrosio016
Copy link
Member

This is a hard problem, i think its kind of impossible to solve without either a rustc fork, or some weird hacks. Primarily because cfg is a thing, which means we would need to recompile things twice or use the CPU target for GPU codegen which causes its own problems. Its just a hard issue overall

@gzz2000
Copy link
Author

gzz2000 commented Jun 9, 2022

This is a hard problem, i think its kind of impossible to solve without either a rustc fork, or some weird hacks. Primarily because cfg is a thing, which means we would need to recompile things twice or use the CPU target for GPU codegen which causes its own problems. Its just a hard issue overall

Thanks for your reply. I just came across an abandoned project accel. We can see that their code have been much similar than what we discussed as a single source code file, using procedural macros. I have no idea how they implemented that and if that means compiling the code twice.

@jac-cbi
Copy link
Contributor

jac-cbi commented Jul 1, 2022

StupidQ: would declaring the function as inline cause it to get built twice, once for each target?

@RDambrosio016
Copy link
Member

Technically yes, but that will still abide by the cpu target's cfg stuff which causes problems. You would also need another custom codegen that derefs to cg_llvm for some things and nvvm for others. And that would moreover probably require forking cg_llvm to make some things public... its a mess

@jac-cbi
Copy link
Contributor

jac-cbi commented Jul 1, 2022

Seems like a good place for a function macro then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants