-
Notifications
You must be signed in to change notification settings - Fork 187
Fix read_volatile
intrinsic
#216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Note I have never touched this code before and we don't have tests, so I am very unsure of this change. |
ce65c5e
to
e72a113
Compare
016c8b4
to
fab7455
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
jedisct1/rust-ed25519-compact@master...brandonros:rust-ed25519-compact:master Can you give this a look? I did this as a wild guess I'm afraid. ![]() Can I do this atomic/fence part better/differently? I see you are fixing |
Ignore that one, that's the "working" library (rust-ed25519-compact) actually. The non-working one is the more popular curve25519-dalek. Here is the IllegalAddress one: let scalar = curve25519_dalek::Scalar::from_bytes_mod_order(input); // good
let point = curve25519_dalek::constants::ED25519_BASEPOINT_TABLE * &scalar; // good
let compressed_point = point.compress(); // problem in here
/*let public_key_bytes = compressed_point.to_bytes();
public_key_bytes*/ Let me figure out exactly which line in curve25519_dalek is giving us the problem, might give us a clue |
Update: drilling it into 1 specific line from curve25519-dalek library let mut input = [0u8; 32];
input.copy_from_slice(&hashed_private_key_bytes[0..32]);
let scalar = curve25519_dalek::Scalar::from_bytes_mod_order(input);
let point = curve25519_dalek::constants::ED25519_BASEPOINT_TABLE * &scalar;
let recip = point.Z.invert();
let x = &point.X * &recip;
let y = &point.Y * &recip;
let mut s: [u8; 32];
s = y.as_bytes();
s[31] ^= x.is_negative().unwrap_u8() << 7; // <<< it is this line that causes IllegalAddress will keep digging in |
If I convert this impl FieldElement {
/// Determine if this `FieldElement` is negative, in the sense
/// used in the ed25519 paper: `x` is negative if the low bit is
/// set.
///
/// # Return
///
/// If negative, return `Choice(1)`. Otherwise, return `Choice(0)`.
pub fn is_negative(&self) -> Choice {
let bytes = self.as_bytes();
(bytes[0] & 1).into()
} into this: let x_bytes = x.as_bytes();
let x_is_negative = x_bytes[0] & 1;
s[31] ^= x_is_negative << 7; one IllegalAddress problem goes away and I get further I ended up with this... I might be chasing a ghost/it might be a different issue (like some alignment issue or something that throws things off down the stack or something) else {
let mut input = [0u8; 32];
input.copy_from_slice(&hashed_private_key_bytes[0..32]);
let scalar = curve25519_dalek::Scalar::from_bytes_mod_order(input);
let point = curve25519_dalek::constants::ED25519_BASEPOINT_TABLE * &scalar;
let recip = point.Z.invert();
let x = &point.X * &recip;
let y = &point.Y * &recip;
let mut s: [u8; 32];
s = y.as_bytes();
let x_bytes = x.as_bytes();
let x_is_negative = x_bytes[0] & 1;
s[31] ^= x_is_negative << 7;
let compressed_point = curve25519_dalek::edwards::CompressedEdwardsY(s);
let public_key_bytes = compressed_point.to_bytes();
//public_key_bytes // if you try to return this, IllegalAddress
[0u8; 32] // if you return this, everything else above it passes?
} I'll leave it on the branch for you to repro if you want |
sorry for the spam, journey continues the function can return the Otherwise it's like a Drop or something that I've seen when playing with Rust and FFI before. Similar behavior.
I used to have do things like this |
Even trying this let public_key_bytes = compressed_point.to_bytes();
output.copy_from_slice(&public_key_bytes[0..32]); let mut public_key_bytes = [0u8; 32];
derrive_public_key(hashed_private_key_bytes, &mut public_key_bytes); or this let mut output = [0u8; 32];
output.copy_from_slice(&public_key_bytes[0..32]);
output does not seem to make it much better |
// ed25519 private key -> public key (first 32 bytes only)
cuda_std::println!("hashed_private_key_bytes: {:02x?}", hashed_private_key_bytes);
let mut public_key_bytes = [0u8; 32];
derrive_public_key(hashed_private_key_bytes, &mut public_key_bytes);
//cuda_std::println!("public_key_bytes: {}", public_key_bytes[0]); // if you uncomment this, IllegalAddress This is about as simple as I can boil it down to |
this was a pretty bad miss on my end... if the final result is not needed, it's most likely (obviously) dropped by the compiler let me add a println! in between each line/result and see if that helps |
just this, fine
by enabling the multiplication... it breaks even the "#1" working step... will continue to dive in, don't expect you to read all this, hoping it helps |
No, the stream of consciousness is good and helps! Thanks for sticking with it. I had originally looked at |
So I did something a bit strange/foolish to help dive into this: https://github.com/brandonros/nvptx-llvm-poc Just to remove Rust-CUDA from the mix, I made a small pipeline of Rust -> LLVM IR -> NVPTX to see if it could shed insight.
# Build with cargo
pushd kernel
cargo clean
cargo +nightly build --release --target nvptx64-nvidia-cuda
popd
# Find the generated .ll file (cargo puts it in a nested directory)
find kernel/target/nvptx64-nvidia-cuda/release/deps -name "kernel.ll" -exec cp {} output/kernel.ll \;
# Convert .ll to .ptx
llc -march=nvptx64 -mcpu=sm_75 output/kernel.ll -o output/kernel.ptx Does this help us? I am on a newer version of nightly... Shall I try this code without the precomputed table, let me see how that goes... |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as outdated.
This comment was marked as outdated.
cc392bb
to
efd57d0
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
I'm going to split the |
This isn't fuly correct, as ideally we keep track of what we have put into constant memory and when it is filled up spill insead of only spilling when a static is big. But, this is materially better than what is there. Fixes Rust-GPU#208. See also the debugging and discussion in Rust-GPU#216
This isn't fully correct, as ideally we keep track of what we have put into constant memory and when it is filled up spill insead of only spilling when a static is big. But, this is materially better than what is there (which is a runtime error). An argument can be made to just _always_ use global memory and we don't have to worry about getting the packing right. Fixes Rust-GPU#208. See also the debugging and discussion in Rust-GPU#216
This isn't fully correct, as ideally we keep track of what we have put into constant memory and when it is filled up spill instdead of only spilling when a static is big. But, this is materially better than what is there (which is a runtime error). An argument can be made to just _always_ use global memory and we don't have to worry about getting the packing right. Fixes Rust-GPU#208. See also the debugging and discussion in Rust-GPU#216
* Allow address spaces to propagate to LLVM This looks like it was code that wasn't deleted after the refactor in decda87 * Spill large statics from constant to global memory This isn't fully correct, as ideally we keep track of what we have put into constant memory and when it is filled up spill instdead of only spilling when a static is big. But, this is materially better than what is there (which is a runtime error). An argument can be made to just _always_ use global memory and we don't have to worry about getting the packing right. Fixes #208. See also the debugging and discussion in #216 * Add `--use-constant-memory-space` flag, off by default * Make it clear that `#[cuda_std::address_space(constant)]` still works
@LegNeato I just confirmed we need this. Without it |
It would always generate PTX to load only a 1-bit integer (`i1`) from the provided pointer, regardless of the actual size of `T`. The alignment specified for the load was also based on the pointer's alignment, not the pointee's alignment. This commit changes it to: 1. Load the full `llvm_type_of(T)` instead of `i1`. 2. Use the correct alignment of `T` for the load instruction. 3. Removes an incorrect pointer cast that was based on the return ABI of `T` rather than the type of the memory being read. Possibly fixes Rust-GPU#208.
are you good to merge, anything you want to wait for? |
I'm not exactly sure this is correct in all cases, I hit a different issue while investigating the sha2 stuff and trying to track down if this is the cause. Not familiar with this area so fumbling around in the dark a bit. |
It would always generate PTX to load only a 1-bit integer (
i1
) from the provided pointer, regardless of the actual size ofT
. The alignment specified for the load was also based on the pointer's alignment, not the pointee's alignment.This commit changes it to:
llvm_type_of(T)
instead ofi1
.T
for the load instruction.T
rather than the type of the memory being read.Possibly fixes #208.