Skip to content

Binary expects data to be bytes #399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 28, 2019
Merged

Conversation

jakirkham
Copy link
Contributor

No description provided.

@salty-horse
Copy link

I believe the text you change refers to Python 2.
The next words already mention bytes and Python 3.

IMO a better change would be to mention each version and its type explicitly: str in Python 2, and bytes in Python 3.

@behackett
Copy link
Member

@salty-horse is correct. PyMongo used to support Python versions all the way back to 2.3. bytes was new in Python 2.6 (and just an alias for str in 2.6 and 2.7). We've dropped support for Python 2.6 in master. It's probably OK to change the string this way, but also get rid of the redundant "(:class:bytes in python 3)" part.

@salty-horse
Copy link

It's probably OK to change the string this way, but also get rid of the redundant "(:class:bytes in python 3)" part.

Please don't remove documentation for Python 2 from the library until support is dropped completely. People are still using it.

@behackett
Copy link
Member

This wouldn't remove Python 2 docs. bytes exists in Python 2. It's an alias for str.

@salty-horse
Copy link

It's an alias for the sake of easing Python 3 migrations/compatibility. Documentation relating to the Python 2 type str should use str to refer to it rather than bytes.

Making this line only mention bytes, which technically correct for both Python 2 and 3, will make it harder to understand for anyone coming from a Python 2 background. I don't think the suggested change is worth the loss of clarity.

@jakirkham
Copy link
Contributor Author

If this is intended to refer to Python 2, which was not actually clear to me, I would suggest choosing language in the docs that makes that clear.

@salty-horse
Copy link

I don't understand the rush to favor one Python version over another, when the docs can be explicit about both with little effort.

Compare with a similar note in bson/__init__:

Note that, when using Python 2.x, to save binary data it must be wrapped as
an instance of `bson.binary.Binary`. Otherwise it will be saved as a BSON
string and retrieved as unicode. Users of Python 3.x can use the Python bytes
type.

@ShaneHarvey
Copy link
Member

I don't understand the rush to favor one Python version over another, when the docs can be explicit about both with little effort.

We are not favoring one version over another and we are still documenting both versions. However, I do think it makes sense to refer to the Python 3 types first and Python 2 types second. Python 3 is the future. Seasoned python developers who write both 3 and 2 will understand the documentation regardless of which version appears first but I think referring to Python 3 first will help newer developers who only know Python 3.

So I think changing from :class:`str` (:class:`bytes` in python 3) to :class:`bytes` (:class:`str` in python 2) is beneficial.

@jakirkham
Copy link
Contributor Author

Note that, when using Python 2.x, to save binary data it must be wrapped as an instance of bson.binary.Binary. Otherwise it will be saved as a BSON string and retrieved as unicode.

On a different note (and part of my motivation for looking at this part of the documentation), we didn't find this to be exactly true. When saving a bytes object, we found it returned a bson.binary.Binary object instead. Also we were a little confused as we didn't realize bson.binary.Binary inherits from bytes. Naively we had assumed we would just get a bytes object back on both Python versions. This is of course outside the scope of this PR, but figured the feedback might be useful.

@jakirkham
Copy link
Contributor Author

Please let me know if anything else is needed :)

@jakirkham
Copy link
Contributor Author

Anything else needed here?

@ShaneHarvey ShaneHarvey merged commit 5950abf into mongodb:master Mar 28, 2019
@ShaneHarvey
Copy link
Member

Thanks for the fix @jakirkham.

Naively we had assumed we would just get a bytes object back on both Python versions.

Unfortunately, we cannot return a bytes object in both Python 2 and Python 3 because in Python 2 bytes is just an alias for str. So decoding it to Binary in Python 2 and bytes in Python 3 is the best we can do.

Consider that if we did decode to bytes in Python 2 this is what would happen:

>>> from bson import BSON
>>> from bson.binary import Binary
>>>
>>> encoded = BSON.encode({'binary': Binary(b'my_bytes')})
>>> decoded = encoded.decode()
>>> decoded
{u'binary': 'my_bytes'}  # Decoded Binary subtype 0 as "bytes" in Python 2
>>> # This might seem okay until we try to encode this document again: 
>>> encoded = BSON.encode(decoded) 
>>> decoded = encoded.decode()
>>> decoded
{u'binary': u'my_bytes'}  # Notice the "u" prefix?
>>> # 'my_bytes' is now a unicode because we encode `str` to BSON's UTF-8 string type and then decode to Python 2 unicode. 

ShaneHarvey pushed a commit that referenced this pull request Mar 28, 2019
@jakirkham jakirkham deleted the patch-1 branch April 15, 2019 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants