-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-5172 bugfix: Add __repr__ and __eq__ to bson.binary.BinaryVector #2162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PYTHON-5172 bugfix: Add __repr__ and __eq__ to bson.binary.BinaryVector #2162
Conversation
@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will |
Can you add the original PYTHON ticket for BSON Binary Vector if we're not adding a new ticket? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a new python ticket for this because it's a new feature.
bson/binary.py
Outdated
@@ -247,6 +247,9 @@ def __init__(self, data: Sequence[float | int], dtype: BinaryVectorDtype, paddin | |||
self.dtype = dtype | |||
self.padding = padding | |||
|
|||
def __repr__(self) -> str: | |||
return f"BinaryVector - {self.dtype=}, {self.dtype},{self.padding=}, {self.data=}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to follow Python's convention for repr: https://docs.python.org/3/library/functions.html#repr
And we should add tests for this like we do for our other custom type reprs.
f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here. |
Yeah. I'm going to switch to format. These vectors could get massive. I don't want users to think mongodb is slow when they've simply got something in a logger.warning statement.. |
Our pre-commit automatically converts str.format into f-string! What's the workaround? |
You can ignore specific ruff rules with |
Should we consider adding the following to pyproject.toml?
|
Avoiding cluttering the codebase with unused directives is a pattern we follow consistently, are you seeing linting failures without this change? |
Wrong rule. That was from ChatGPT. |
bson/binary.py
Outdated
@@ -247,6 +247,11 @@ def __init__(self, data: Sequence[float | int], dtype: BinaryVectorDtype, paddin | |||
self.dtype = dtype | |||
self.padding = padding | |||
|
|||
def __repr__(self) -> str: | |||
return "BinaryVector(dtype={}, padding={}, data={})".format( # noqa: UP032 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will repr create the string even when it's not called?
f-string vs format makes no difference. If anything f-string will be faster. __repr__
only runs when repr() is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__
only runs when repr() is called.
So if one has logger.debug(vector), is it not true that str.format won't be called, but the f-string will be created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging behavior is unrelated to this PR since we're just adding a new method. The behavior of repr(vector)
is the same whether we use f-string vs format() so let's go back to using f-string.
Test failures are async ones, unrelated to these. |
four = BinaryVector(data, BinaryVectorDtype.PACKED_BIT, padding=3) | ||
self.assertEqual( | ||
repr(four), f"BinaryVector(dtype=BinaryVectorDtype.PACKED_BIT, padding=3, data={data})" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add asserts using assertRepr like we do in test_connection_monitoring.py?:
def assertRepr(self, obj):
new_obj = eval(repr(obj))
self.assertEqual(type(new_obj), type(obj))
self.assertEqual(repr(new_obj), repr(obj))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. That's very cool, that one can create an object using eval(repr(obj))
@@ -247,6 +247,9 @@ def __init__(self, data: Sequence[float | int], dtype: BinaryVectorDtype, paddin | |||
self.dtype = dtype | |||
self.padding = padding | |||
|
|||
def __repr__(self) -> str: | |||
return f"BinaryVector(dtype={self.dtype}, padding={self.padding}, data={self.data})" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold up, the real bug here is that we're using @dataclass
but also defining __init__
. Why is that? The whole point of dataclass is that it provides __init__
and __repr__
automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we had issues with the combo of {dataclass, slots, defaults}. So I've just removed the dataclass decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other features that will break by removing @dataclass
? We may be stuck with it until 5.0 if it's a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. As mentioned in the closed PR, we've done init, and repr, and we don't want to rely on equality except via the binary representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems happy to be a dataclass or not. I have no strong preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show an example of ==
and >
before and after this change? I'm wondering if the old code (with @dataclass
) even works at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. It's easy to try. Put a breakpoint on line 83 of test_bson_binary.
# With the decorator:
vector_obs == vector_exp.as_vector()
True
vector_obs > vector_exp.as_vector()
# TypeError: '>' not supported between instances of 'BinaryVector' and 'BinaryVector'
# Without the decorator:
vector_obs == vector_exp.as_vector()
False
vector_obs > vector_exp.as_vector()
# TypeError: '>' not supported between instances of 'BinaryVector' and 'BinaryVector'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
==
is already broken with our current @dataclass
approach:
>>> from bson.binary import *
>>> BinaryVector([1], BinaryVectorDtype.INT8) == BinaryVector([2], BinaryVectorDtype.INT8)
True
So we need to remove dataclass and implement ==
as well as __repr__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Take a look. I hate subtleties with == and floats..
…nd __repr__ manually.
…init__ and __repr__ manually." This reverts commit 916ec33.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Could you add a note for this in the changelog?
Sure! Is this targeting 4.11 or 4.12? If the former, should I bump the micro version and add one line for description and another for jira? If the latter, should I additionallyx start up a whole new section? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, let's add this to changelog later since I'm adding the new 4.12 section in another PR right now and we may end up being backporting this as a bug fix. Could you update the PR title and Jira ticket to say this is a bug fix for repr
and ==
?
bson/binary.py
Outdated
def __repr__(self) -> str: | ||
return f"BinaryVector(dtype={self.dtype}, padding={self.padding}, data={self.data})" | ||
|
||
def __eq__(self, other: BinaryVector) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typing failure:
bson/binary.py:251: error: Argument 1 of "__eq__" is incompatible with
supertype "object"; supertype defines the argument type as "object" [override]
def __eq__(self, other: BinaryVector) -> bool:
^~~~~~~~~~~~~~~~~~~
Updated names. In the JIRA, I marked it as bug instead of improvement. Please adjust if that's not what you meant. |
No ticket. I just noticed this while working on #2161