Skip to content

add bytes conversion #5111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

add bytes conversion #5111

wants to merge 3 commits into from

Conversation

kykosic
Copy link

@kykosic kykosic commented May 4, 2025

Add conversions to bytes::Bytes crate.

This can also help some use cases related to #2888 when trying to deal with PyBytes.

I would appreciate feedback with regard to BytesMut. Currently it gets converted into PyBytes, and can be extracted from PyBytes and PyByteArray. This could cause confusion for users who expect a function taking ByteMut to offer a shared mutable buffer between Python and Rust, which cannot exist without a GIL reference. It might also make sense to only offer Bytes as you can always do BytesMut::from(Bytes).

@mejrs
Copy link
Member

mejrs commented May 4, 2025

I'm unsure whether this conversion is a good idea. This implementation makes a full copy of the data every time (both ways) which I think people might not expect. So I'd err on the side of having such conversions be explicit, not implicit.

@mejrs
Copy link
Member

mejrs commented May 4, 2025

See also #1992

@davidhewitt
Copy link
Member

We could probably avoid copying from Python to Rust by using Bytes::from_owner (maybe with PyBackedBytes as the owner) but going from Rust to Python would almost certainly require a copy.

I'd love it for Python bytes objects to support a similar API where they allow construction from a slice and a destructor, then we could avoid copies both directions. But we're not there yet.

@davidhewitt
Copy link
Member

... so while I think this is probably a useful thing to have, it might be that it's a future useful thing to have?

@bschoenmaeckers
Copy link
Member

I think this can still be useful if python -> rust is non-copy but rust -> bytes is copy, as IntoPyObject for Vec<u8> is already doing a copy.

@lmmx
Copy link

lmmx commented May 5, 2025

In case it’s not on your radars here, the pyo3-bytes crate does this zero-copy (used by the pyo3-object_store cloud IO crate, and from the same repo).

The approach @davidhewitt suggests above is precisely the way it took:

This provides [PyBytes], a wrapper around [Bytes][::bytes::Bytes] that supports the Python buffer protocol.

This uses the new Bytes::from_owner API introduced in bytes 1.9.

In that lib a PyBytesWrapper is passed into Bytes::from_owner() (thus the owner of the underlying memory), see the FromPyObject impl:

impl<'py> FromPyObject<'py> for PyBytes {
    fn extract_bound(ob: &Bound<'py, PyAny>) -> PyResult<Self> {
        let buffer = ob.extract::<PyBytesWrapper>()?;
        let bytes = Bytes::from_owner(buffer);
        Ok(Self(bytes))
    }
}

The buffer: PyBytesWrapper itself wraps a PyBuffer<u8>.

/// A wrapper around a PyBuffer that applies a custom destructor that checks if the Python
/// interpreter is still initialized before freeing the buffer memory.
///
/// This also implements AsRef<[u8]> because that is required for Bytes::from_owner
#[derive(Debug)]
struct PyBytesWrapper(Option<PyBuffer<u8>>);

@kykosic
Copy link
Author

kykosic commented May 5, 2025

... so while I think this is probably a useful thing to have, it might be that it's a future useful thing to have?

If I were to make it zero copy Python to Rust, would it still be worth merging and possibly improve to full zero-copy in the future?

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were to make it zero copy Python to Rust...

Can you use the from_owner constructor mentioned earlier? I'm fine with FromPyObject for Bytes, but iffy about the other way around.

Comment on lines +26 to +29
//! #[pyfunction]
//! fn get_message_bytes() -> Bytes {
//! Bytes::from(b"Hello Python!".to_vec())
//! }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more efficient to create a PyBytes object directly here. This is an example of why I'm not a fan of the Rust to Python conversion.

Comment on lines +31 to +34
//! #[pyfunction]
//! fn num_bytes(bytes: Bytes) -> usize {
//! bytes.len()
//! }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is not that great, the user should just use &[u8] in this case. Please change the example to something where the Bytes is actually needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants