-
Notifications
You must be signed in to change notification settings - Fork 103
Description
Description
msgspec.Struct and dataclasses differ in how they behave with descriptors.
From the dataclass documentation
Fields that are assigned descriptor objects as their default value have the following special behaviors:
- The value for the field passed to the dataclass’s init() method is passed to the descriptor’s set() method rather than overwriting the descriptor object.
- Similarly, when getting or setting the field, the descriptor’s get() or set() method is called rather than returning or overwriting the descriptor object.
Currently the msgspec documentation has no mention of the behavior with descriptors. From messing around with it, it seems msgspec just overwrites the descriptor object with whatever was passed into the constructor.
Here's an example:
from typing import Any
import msgspec
class AnObj:
def __init__(self, name, val):
self._name = name
self._val = val
class AClass:
def __init__(self, name: str):
self.name = name
self._attr_name = "__art_prop"
def __get__(self, instance, owner) -> AnObj:
print(f"Getting {self._attr_name}")
return AnObj(self.name, getattr(instance, self._attr_name, None))
def __set__(self, instance, value: Any):
print(f"Setting {self._attr_name} to {value}")
setattr(instance, self._attr_name, value)
class MeValue(msgspec.Struct):
aval: int
afile: AClass = AClass("testfile")
@classmethod
def enc_hook(self, obj: Any) -> Any:
"""Custom serialization hook."""
print("enc_hook", type(obj))
if isinstance(obj, AnObj):
return f"AnObj({obj._name}, {obj._val})"
raise NotImplementedError(f"Objects of type {type(obj)} are not supported")
def serialize(self) -> bytes:
"""Serialize the object to json bytes."""
return msgspec.json.Encoder(enc_hook=self.enc_hook).encode(self)
value = MeValue(aval=1, afile=1)
print(value.serialize())
I'd like it to return
b'{"aval":1, "afile": "AnObj(testfile, 1)"}'
on serialization. Currently I get
b'{"aval":1,"afile":1}'
A similar issue arises when trying to serialize dataclasses containing descriptors with the msgspec encoder, since the encoder takes a look at the dataclass object's __dict__
, and not the fields returned by dataclasses.fields
.
I understand that calling into python for accesses will impact performance and so maybe something this library is against doing. Let me know your thoughts. At the very least, maybe the docs can be updated with a note about descriptor behavior.