Skip to content

dataclass overriding a property in the parent class is broken #121354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
marksandler2 opened this issue Jul 3, 2024 · 11 comments
Open

dataclass overriding a property in the parent class is broken #121354

marksandler2 opened this issue Jul 3, 2024 · 11 comments
Assignees
Labels
stdlib Python modules in the Lib dir topic-dataclasses type-bug An unexpected behavior, bug, or error

Comments

@marksandler2
Copy link

marksandler2 commented Jul 3, 2024

Bug report

Bug description:

Consider the following snippet

@dc.dataclass
class A:
  a: int
  @property
  def b(self) -> int:
    return 1

@dc.dataclass
class B(A):  
  b: int = dc.field(default_factory=lambda: 0)

Dataclass apparently doesn't create a class attribute, so class B ends up with property , b of class A, which seems surprising, especially given that there is explicit assignent of field b in class B(A).

So this results in:

>>> B.b
<property object at ...>
>>> B(a=2)
Traceback (most recent call last):                                                                                                                            
  File "<stdin>", line 1, in <module>                                                                                                                         
  File "<string>", line 4, in __init__                                                                                                                        
AttributeError: property 'b' of 'B' object has no setter      

If i replace definition of b like so b: int = dc.field(default=0), or with b: int = 0 then everything works as expected:

>>> B.b
0
>>> B(a=2)
B(a=2, b=0)

I think it would be nice if dataclass would always create a class attribute (especially if there is explicit assignment!)
so that overloads work as expected.

CPython versions tested on:

3.11, 3.12

Operating systems tested on:

Linux

@marksandler2 marksandler2 added the type-bug An unexpected behavior, bug, or error label Jul 3, 2024
@sobolevn sobolevn self-assigned this Jul 4, 2024
@sobolevn sobolevn added the stdlib Python modules in the Lib dir label Jul 4, 2024
@sobolevn
Copy link
Member

sobolevn commented Jul 4, 2024

Good catch, thank you! I will take a look.

@sobolevn
Copy link
Member

sobolevn commented Jul 5, 2024

I reproduced the same behaviour without dataclasses:

class A1:
  def __init__(self, a):
    self.a = a

  @property
  def b(self) -> int:
    return 1

class B1(A1):
  def __init__(self, a, b=0) -> None:
    self.a = a
    self.b = b

print(B1.b)
print(B1(1).b)

produces:

<property object at 0x104fa6f20>
Traceback (most recent call last):
  File "/Users/sobolev/Desktop/cpython2/ex.py", line 17, in <module>
    print(B1(1).b)
          ~~^^^
  File "/Users/sobolev/Desktop/cpython2/ex.py", line 14, in __init__
    self.b = b
    ^^^^^^
AttributeError: property 'b' of 'B1' object has no setter

So, I think that there's no actual bug. When dataclasses behave as regular python's classes - it is expected.

For your case you can do:

@dc.dataclass
class B(A):
  _x: int | None = None
  @property
  def b(self) -> int:
    return self._x or 0
  @b.setter
  def b(self, value: int) -> None:
    self._x = value

@sobolevn sobolevn closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2024
@sobolevn sobolevn removed type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Jul 5, 2024
@sobolevn
Copy link
Member

sobolevn commented Jul 5, 2024

Oh, b: int = dc.field(default=0) makes it a bit different.
Sorry, I've missed this part :)

@sobolevn sobolevn reopened this Jul 5, 2024
@sobolevn sobolevn added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Jul 5, 2024
@sobolevn
Copy link
Member

sobolevn commented Jul 5, 2024

b: int = dc.field(default=0) works, because it explicitly overrides a class-attribute with the default value:

setattr(cls, f.name, f.default)

I think that what can be done for this case: we can check that if a field overrides a property, then we can delete that property from our class namespace.

This way both b: int = dc.field(default=0) and b: int = dc.field(default_factory=lambda: 0) would work the same way. Right now it is indeed confusing.

@sobolevn
Copy link
Member

sobolevn commented Jul 5, 2024

Or we can leave this as-is, because:

  • It will be quite time-consuming to perform this mro traversal check for props
  • It is a very niche use-case
  • It works kinda similar with regular classes

cc @ericvsmith

@marksandler2
Copy link
Author

@sobolevn Yes, your original example is WAI - since property wasn't overriden. But here we explicity create a new override, so naively i would expect it to override the property definition. E.g. i can override property with another property or another class variable.

Would it make sense for dataclass to just always create a class attribute?

class B(a):
   b: int = dc.field(...)

should have B.b == dc.field, because this is what I would expect from such an assignment. THis way you don't need to do any special handling of properties.

delattr(cls, f.name)

There is a comment in dataclass saying that if defautl_factory is specified "it should not be set at all in the post-processsed class", but it is not clear "why".

  if f.default is MISSING:
                # If there's no default, delete the class attribute.
                # This happens if we specify field(repr=False), for
                # example (that is, we specified a field object, but
                # no default value).  Also if we're using a default
                # factory.  The class attribute should not be set at
                # all in the post-processed class.
                delattr(cls, f.name)

@marksandler2
Copy link
Author

marksandler2 commented Jul 7, 2024

The example above of the workaround, isn't really very useful:

@dc.dataclass
class B(A):
  _x: int | None = None
  @property
  def b(self) -> int:
    return self._x or 0
  @b.setter
  def b(self, value: int) -> None:
    self._x = value

This is not a good solution at all, since i won't be able to pass b=3 in the constructor, and so will need to expose user to the implementation details of the class.

Currently we just switched to use functools.cached_property, which doesn't have this problem and works for our use case

@ericvsmith
Copy link
Member

I think the argument for not setting the class attribute is that field is supposed to be invisible to the class user.

My inclination would be to leave things as-is.

@marksandler2
Copy link
Author

marksandler2 commented Sep 26, 2024 via email

@ericvsmith
Copy link
Member

The argument there is that the default values would be useful, but not the default non-values (that is, field). It's entirely possible this was a bad decision, but it's where we are. I'm worried that if we don't delete the class attribute then something else will break.

@eliegoudout
Copy link

eliegoudout commented Dec 22, 2024

Hello,

I was about to open an issue because I came across this in quite a different context, while realizing dataclass handles non-data descriptors differently from classes. I'm telling you this because the behaviour you're observing has nothing to do with subclassing I think.

>>> desc = lambda: ...  # Random non-data descriptor
>>>
>>> class A:
...  x: ... = desc
... 
>>> A().x
<bound method <lambda> of <__main__.A object at 0x10bea96d0>>
>>> 
>>> @dataclass
... class B:
...  x: ... =  desc
... 
>>> B().x
<function <lambda> at 0x10bcf9f80>

Notice how the return value is "bound" in the vanilla case, and not bound in the dataclass case. This happens because of the parameter look-up mechanism and the fact that dataclass adds statements like self.attr = valto __init__, modifying instance's dict directly:

>>> vars(A())
{}
>>> vars(B())
{'x': <function <lambda> at 0x10bcf9f80>}

The problem now is that for data-descriptors like property(), which implements __set__(), I think the statement self.attr = val will call getattr(type(self), attr).__set__(attr, val), which gives this error:

>>> @dataclass
... class C:
...  x: ... = property(lambda _: ...)
... 
>>> C()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 3, in __init__
AttributeError: property 'x' of 'C' object has no setter

To keep consistent behaviour with the look-up that shortcuts __get__ for non-data descriptor, maybe it should shortcut every descriptor both for get and set (and maybe del too?). It would not be an issue for properties defined with def in the classes, since these should never get annotated.

I think this may require a bit of rethinking for dataclasses.py though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-dataclasses type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants