PYTHON-5172 bugfix: Add repr and eq to bson.binary.BinaryVector #2162

caseyclements · 2025-02-27T18:51:16Z

No ticket. I just noticed this while working on #2161

caseyclements · 2025-02-27T19:06:26Z

@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will __repr__ create the string even when it's not called?

blink1073 · 2025-02-27T22:54:22Z

Can you add the original PYTHON ticket for BSON Binary Vector if we're not adding a new ticket?

ShaneHarvey

Let's add a new python ticket for this because it's a new feature.

ShaneHarvey · 2025-02-27T23:13:41Z

bson/binary.py

@@ -247,6 +247,9 @@ def __init__(self, data: Sequence[float | int], dtype: BinaryVectorDtype, paddin
        self.dtype = dtype
        self.padding = padding

+    def __repr__(self) -> str:
+        return f"BinaryVector - {self.dtype=}, {self.dtype},{self.padding=}, {self.data=}"


This needs to follow Python's convention for repr: https://docs.python.org/3/library/functions.html#repr

And we should add tests for this like we do for our other custom type reprs.

NoahStapp · 2025-02-28T15:01:25Z

@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will __repr__ create the string even when it's not called?

f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here.

caseyclements · 2025-02-28T15:10:04Z

@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will __repr__ create the string even when it's not called?

f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here.

Yeah. I'm going to switch to format. These vectors could get massive. I don't want users to think mongodb is slow when they've simply got something in a logger.warning statement..

caseyclements · 2025-02-28T15:16:34Z

@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will __repr__ create the string even when it's not called?

f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here.

Yeah. I'm going to switch to format. These vectors could get massive. I don't want users to think mongodb is slow when they've simply got something in a logger.warning statement..

Our pre-commit automatically converts str.format into f-string! What's the workaround?

NoahStapp · 2025-02-28T15:19:54Z

@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will __repr__ create the string even when it's not called?

f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here.

Yeah. I'm going to switch to format. These vectors could get massive. I don't want users to think mongodb is slow when they've simply got something in a logger.warning statement..

Our pre-commit automatically converts str.format into f-string! What's the workaround?

You can ignore specific ruff rules with # noqa: <rule> on the offending line.

caseyclements · 2025-02-28T15:33:19Z

@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will __repr__ create the string even when it's not called?

f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here.

Yeah. I'm going to switch to format. These vectors could get massive. I don't want users to think mongodb is slow when they've simply got something in a logger.warning statement..

Our pre-commit automatically converts str.format into f-string! What's the workaround?

You can ignore specific ruff rules with # noqa: <rule> on the offending line.

Should we consider adding the following to pyproject.toml?

[tool.ruff]
ignore = ["RUF100"]

NoahStapp · 2025-02-28T16:06:32Z

@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will __repr__ create the string even when it's not called?

f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here.

Yeah. I'm going to switch to format. These vectors could get massive. I don't want users to think mongodb is slow when they've simply got something in a logger.warning statement..

Our pre-commit automatically converts str.format into f-string! What's the workaround?

You can ignore specific ruff rules with # noqa: <rule> on the offending line.

Should we consider adding the following to pyproject.toml?
[tool.ruff]
ignore = ["RUF100"]

Avoiding cluttering the codebase with unused directives is a pattern we follow consistently, are you seeing linting failures without this change?

caseyclements · 2025-02-28T16:13:17Z

@NoahStapp One question in my mind. Is using an f-string an issue? If the vector is really large, will __repr__ create the string even when it's not called?

f-strings are always evaluated, even if never used. In this case, I don't know if the performance benefit of f-strings is offset by a very large vector being parsed into a string at runtime. It's probably worth running some quick benchmarks to see if there is a performance difference here.

Yeah. I'm going to switch to format. These vectors could get massive. I don't want users to think mongodb is slow when they've simply got something in a logger.warning statement..

Our pre-commit automatically converts str.format into f-string! What's the workaround?

You can ignore specific ruff rules with # noqa: <rule> on the offending line.

Should we consider adding the following to pyproject.toml?
[tool.ruff]
ignore = ["RUF100"]
Avoiding cluttering the codebase with unused directives is a pattern we follow consistently, are you seeing linting failures without this change?

Wrong rule. That was from ChatGPT. UP032 is the f-string one. I don't know. It feels like when someone is using format they probably have chosen to do so for a reason.

ShaneHarvey · 2025-02-28T19:09:08Z

bson/binary.py

@@ -247,6 +247,11 @@ def __init__(self, data: Sequence[float | int], dtype: BinaryVectorDtype, paddin
        self.dtype = dtype
        self.padding = padding

+    def __repr__(self) -> str:
+        return "BinaryVector(dtype={}, padding={}, data={})".format(  # noqa: UP032


will repr create the string even when it's not called?

f-string vs format makes no difference. If anything f-string will be faster. __repr__ only runs when repr() is called.

__ only runs when repr() is called.

So if one has logger.debug(vector), is it not true that str.format won't be called, but the f-string will be created?

Logging behavior is unrelated to this PR since we're just adding a new method. The behavior of repr(vector) is the same whether we use f-string vs format() so let's go back to using f-string.

caseyclements · 2025-02-28T21:49:13Z

Test failures are async ones, unrelated to these.

ShaneHarvey · 2025-02-28T22:10:15Z

test/test_bson.py

+        four = BinaryVector(data, BinaryVectorDtype.PACKED_BIT, padding=3)
+        self.assertEqual(
+            repr(four), f"BinaryVector(dtype=BinaryVectorDtype.PACKED_BIT, padding=3, data={data})"
+        )


Could you add asserts using assertRepr like we do in test_connection_monitoring.py?:

def assertRepr(self, obj): new_obj = eval(repr(obj)) self.assertEqual(type(new_obj), type(obj)) self.assertEqual(repr(new_obj), repr(obj))

Done. That's very cool, that one can create an object using eval(repr(obj))

ShaneHarvey · 2025-02-28T22:59:05Z

bson/binary.py

@@ -247,6 +247,9 @@ def __init__(self, data: Sequence[float | int], dtype: BinaryVectorDtype, paddin
        self.dtype = dtype
        self.padding = padding

+    def __repr__(self) -> str:
+        return f"BinaryVector(dtype={self.dtype}, padding={self.padding}, data={self.data})"
+


Hold up, the real bug here is that we're using @dataclass but also defining __init__. Why is that? The whole point of dataclass is that it provides __init__ and __repr__ automatically.

I believe that we had issues with the combo of {dataclass, slots, defaults}. So I've just removed the dataclass decorator.

Are there any other features that will break by removing @dataclass? We may be stuck with it until 5.0 if it's a breaking change.

I don't think so. As mentioned in the closed PR, we've done init, and repr, and we don't want to rely on equality except via the binary representation.

The code seems happy to be a dataclass or not. I have no strong preference.

https://jira.mongodb.org/browse/DRIVERS-3123

Can you show an example of == and > before and after this change? I'm wondering if the old code (with @dataclass) even works at all.

Sure. It's easy to try. Put a breakpoint on line 83 of test_bson_binary.

# With the decorator: vector_obs == vector_exp.as_vector() True vector_obs > vector_exp.as_vector() # TypeError: '>' not supported between instances of 'BinaryVector' and 'BinaryVector' # Without the decorator: vector_obs == vector_exp.as_vector() False vector_obs > vector_exp.as_vector() # TypeError: '>' not supported between instances of 'BinaryVector' and 'BinaryVector'

== is already broken with our current @dataclass approach:

>>> from bson.binary import * >>> BinaryVector([1], BinaryVectorDtype.INT8) == BinaryVector([2], BinaryVectorDtype.INT8) True

So we need to remove dataclass and implement == as well as __repr__.

Done. Take a look. I hate subtleties with == and floats..

…nd __repr__ manually.

…init__ and __repr__ manually." This reverts commit 916ec33.

ShaneHarvey

Looking good! Could you add a note for this in the changelog?

caseyclements · 2025-03-10T17:04:58Z

Looking good! Could you add a note for this in the changelog?

Sure! Is this targeting 4.11 or 4.12? If the former, should I bump the micro version and add one line for description and another for jira? If the latter, should I additionallyx start up a whole new section?

ShaneHarvey

On second thought, let's add this to changelog later since I'm adding the new 4.12 section in another PR right now and we may end up being backporting this as a bug fix. Could you update the PR title and Jira ticket to say this is a bug fix for repr and ==?

ShaneHarvey · 2025-03-10T18:25:02Z

bson/binary.py

+    def __repr__(self) -> str:
+        return f"BinaryVector(dtype={self.dtype}, padding={self.padding}, data={self.data})"
+
+    def __eq__(self, other: BinaryVector) -> bool:


Typing failure:

bson/binary.py:251: error: Argument 1 of "__eq__" is incompatible with supertype "object"; supertype defines the argument type as "object" [override] def __eq__(self, other: BinaryVector) -> bool: ^~~~~~~~~~~~~~~~~~~

caseyclements · 2025-03-10T18:54:42Z

On second thought, let's add this to changelog later since I'm adding the new 4.12 section in another PR right now and we may end up being backporting this as a bug fix. Could you update the PR title and Jira ticket to say this is a bug fix for repr and ==?

Updated names. In the JIRA, I marked it as bug instead of improvement. Please adjust if that's not what you meant.

Adds __repr__ to BinaryVector dataclass

519291c

caseyclements requested a review from NoahStapp February 27, 2025 18:51

Typing

a4ce5c7

ShaneHarvey reviewed Feb 27, 2025

View reviewed changes

Updated repr form to match python convention

8b46ecf

Updated repr form to match python convention and added repr tests

ba43d0c

caseyclements requested a review from ShaneHarvey February 28, 2025 15:50

Removed unused noqa rule RUF100

b83bee4

ShaneHarvey requested changes Feb 28, 2025

View reviewed changes

Reverted BinaryVector repr to f-string following python convention

8bd9482

caseyclements requested a review from ShaneHarvey February 28, 2025 21:40

caseyclements changed the title ~~Adds __repr__ to BinaryVector dataclass~~ PYTHON-5172 Adds __repr__ to BinaryVector dataclass Feb 28, 2025

ShaneHarvey reviewed Feb 28, 2025

View reviewed changes

Additional test of repr form: eval(repr(obj)) ~= obj

389b1cf

caseyclements requested a review from ShaneHarvey February 28, 2025 22:38

ShaneHarvey changed the title ~~PYTHON-5172 Adds __repr__ to BinaryVector dataclass~~ PYTHON-5172 Add __repr__ to BinaryVector dataclass Feb 28, 2025

ShaneHarvey requested changes Feb 28, 2025

View reviewed changes

caseyclements mentioned this pull request Mar 3, 2025

PYTHON-5172 Remove __init__ from BinaryVector dataclass #2171

Closed

Remove dataclass decorator from BinaryVector. Instead, add __init__ a…

916ec33

…nd __repr__ manually.

caseyclements requested a review from ShaneHarvey March 4, 2025 16:49

caseyclements requested review from blink1073 and removed request for blink1073 March 4, 2025 16:49

caseyclements added 2 commits March 5, 2025 10:29

Revert "Remove dataclass decorator from BinaryVector. Instead, add __…

023d8ca

…init__ and __repr__ manually." This reverts commit 916ec33.

BinaryVector is no longer a dataclass. Instead, we implement __eq__

ceb6786

ShaneHarvey requested changes Mar 6, 2025

View reviewed changes

ShaneHarvey approved these changes Mar 10, 2025

View reviewed changes

ShaneHarvey requested changes Mar 10, 2025

View reviewed changes

Typing, other: Any

f35ef2b

caseyclements changed the title ~~PYTHON-5172 Add __repr__ to BinaryVector dataclass~~ PYTHON-5172 bugfix: Add __repr__ and __eq__ to bson.binary.BinaryVector Mar 10, 2025

ShaneHarvey approved these changes Mar 10, 2025

View reviewed changes

caseyclements merged commit b66a5cb into mongodb:master Mar 10, 2025
35 of 37 checks passed

PYTHON-5172 bugfix: Add __repr__ and __eq__ to bson.binary.BinaryVector #2162

PYTHON-5172 bugfix: Add __repr__ and __eq__ to bson.binary.BinaryVector #2162

Uh oh!

Conversation

caseyclements commented Feb 27, 2025

Uh oh!

caseyclements commented Feb 27, 2025

Uh oh!

blink1073 commented Feb 27, 2025

Uh oh!

ShaneHarvey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NoahStapp commented Feb 28, 2025

Uh oh!

caseyclements commented Feb 28, 2025

Uh oh!

caseyclements commented Feb 28, 2025

Uh oh!

NoahStapp commented Feb 28, 2025

Uh oh!

caseyclements commented Feb 28, 2025

Uh oh!

NoahStapp commented Feb 28, 2025

Uh oh!

caseyclements commented Feb 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caseyclements commented Feb 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caseyclements Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey left a comment

Choose a reason for hiding this comment

Uh oh!

caseyclements commented Mar 10, 2025

Uh oh!

ShaneHarvey left a comment

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

PYTHON-5172 bugfix: Add repr and eq to bson.binary.BinaryVector #2162

PYTHON-5172 bugfix: Add repr and eq to bson.binary.BinaryVector #2162

ShaneHarvey Feb 28, 2025 •

edited

Loading

ShaneHarvey Mar 3, 2025 •

edited

Loading

caseyclements Mar 4, 2025 •

edited

Loading

ShaneHarvey Mar 5, 2025 •

edited

Loading

ShaneHarvey Mar 10, 2025 •

edited

Loading