Single source for both CPU and GPU code possible? #49

gzz2000 · 2022-03-01T02:50:10Z

Hello. Thanks for this awesome project. I can now compile CUDA kernels in rust into ptx in one cargo package, and use them in another package. Now I wonder whether it is possible to write both the kernels and the CPU code within one package, or even one rust source file. For example, it might look like this:

// shared code between cpu and gpu. struct definitions may also be shared
// similar to CUDA's __device__ __host__
#[devicehost like thing...]
pub fn adder_both_cpu_gpu(a: f32, b: f32) -> f32 {
    a + b
}

#[kernel]
pub unsafe fn add(a: &[f32], b: &[f32], c: *mut f32) {
    let idx = thread::index_1d() as usize;
    if idx < a.len() {
        let elem = &mut *c.add(idx);
        *elem = adder_both_cpu_gpu(a[idx], b[idx]);
    }
}

fn main() {
    // use kernel fn add
}

Unfortunately, this seems not possible at this moment. However, I think it is purely a matter of some convenient macros. For example, if we define the kernel macro to, instead of directly mark the function as kernel, launch a separate cargo build with cuda_builder, and replace the CPU code with a lazy_static ptx module import. This way it would be easier to manage dependency and reuse between CPU code and GPU code and save some boilerplates.
I don't know if you are interested.. Though it might be harder than I think:(

The text was updated successfully, but these errors were encountered:

rcarson3 · 2022-03-06T01:13:03Z

I'd actually asked about this on the science-and-ai rust discord channel in regards to a no-std crate. I'm just rehashing some of this here but @RDambrosio016 seemed to suggest it might be possible if you made use of #[cfg(target...)] for certain things that were specific to cuda kernel calling and made use of per-target dependencies. Although, it seemed like there might be issues in-regards to the build.rs file deadlocking itself due to a cyclic-dependency. Although, one of the oxide-enzyme team members seemed to suggest that if you checked whether or not you were on your first or second invocation of the build.rs you could get around that issue. They did this by creating a file during the first invocation and using that as a check.

Although, I haven't had the time to check to see if what was proposed there would work as I've still been working on the designs of the library.

beepster4096 · 2022-03-06T03:35:11Z

Couldn't you also check the CARGO_CFG_TARGET_ARCH var in the build script to avoid the deadlock?

RDambrosio016 · 2022-03-10T18:58:51Z

This is a hard problem, i think its kind of impossible to solve without either a rustc fork, or some weird hacks. Primarily because cfg is a thing, which means we would need to recompile things twice or use the CPU target for GPU codegen which causes its own problems. Its just a hard issue overall

gzz2000 · 2022-06-09T04:36:01Z

This is a hard problem, i think its kind of impossible to solve without either a rustc fork, or some weird hacks. Primarily because cfg is a thing, which means we would need to recompile things twice or use the CPU target for GPU codegen which causes its own problems. Its just a hard issue overall

Thanks for your reply. I just came across an abandoned project accel. We can see that their code have been much similar than what we discussed as a single source code file, using procedural macros. I have no idea how they implemented that and if that means compiling the code twice.

jac-cbi · 2022-07-01T15:40:43Z

StupidQ: would declaring the function as inline cause it to get built twice, once for each target?

RDambrosio016 · 2022-07-01T15:43:29Z

Technically yes, but that will still abide by the cpu target's cfg stuff which causes problems. You would also need another custom codegen that derefs to cg_llvm for some things and nvvm for others. And that would moreover probably require forking cg_llvm to make some things public... its a mess

jac-cbi · 2022-07-01T15:48:54Z

Seems like a good place for a function macro then

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single source for both CPU and GPU code possible? #49

Single source for both CPU and GPU code possible? #49

gzz2000 commented Mar 1, 2022 •

edited

Loading

rcarson3 commented Mar 6, 2022 •

edited

Loading

beepster4096 commented Mar 6, 2022

RDambrosio016 commented Mar 10, 2022

gzz2000 commented Jun 9, 2022

jac-cbi commented Jul 1, 2022

RDambrosio016 commented Jul 1, 2022

jac-cbi commented Jul 1, 2022

Single source for both CPU and GPU code possible? #49

Single source for both CPU and GPU code possible? #49

Comments

gzz2000 commented Mar 1, 2022 • edited Loading

rcarson3 commented Mar 6, 2022 • edited Loading

beepster4096 commented Mar 6, 2022

RDambrosio016 commented Mar 10, 2022

gzz2000 commented Jun 9, 2022

jac-cbi commented Jul 1, 2022

RDambrosio016 commented Jul 1, 2022

jac-cbi commented Jul 1, 2022

gzz2000 commented Mar 1, 2022 •

edited

Loading

rcarson3 commented Mar 6, 2022 •

edited

Loading