Skip to content

Descriptor support in msgspec.Struct / dataclasses #864

@msrasheed

Description

@msrasheed

Description

msgspec.Struct and dataclasses differ in how they behave with descriptors.

From the dataclass documentation

Fields that are assigned descriptor objects as their default value have the following special behaviors:

  • The value for the field passed to the dataclass’s init() method is passed to the descriptor’s set() method rather than overwriting the descriptor object.
  • Similarly, when getting or setting the field, the descriptor’s get() or set() method is called rather than returning or overwriting the descriptor object.

Currently the msgspec documentation has no mention of the behavior with descriptors. From messing around with it, it seems msgspec just overwrites the descriptor object with whatever was passed into the constructor.

Here's an example:

from typing import Any
import msgspec

class AnObj:
    def __init__(self, name, val):
        self._name = name
        self._val = val

class AClass:
    def __init__(self, name: str):
        self.name = name
        self._attr_name = "__art_prop"

    def __get__(self, instance, owner) -> AnObj:
        print(f"Getting {self._attr_name}")
        return AnObj(self.name, getattr(instance, self._attr_name, None))

    def __set__(self, instance, value: Any):
        print(f"Setting {self._attr_name} to {value}")
        setattr(instance, self._attr_name, value)

class MeValue(msgspec.Struct):

    aval: int
    afile: AClass = AClass("testfile")

    @classmethod
    def enc_hook(self, obj: Any) -> Any:
        """Custom serialization hook."""
        print("enc_hook", type(obj))
        if isinstance(obj, AnObj):
            return f"AnObj({obj._name}, {obj._val})"
        raise NotImplementedError(f"Objects of type {type(obj)} are not supported")

    def serialize(self) -> bytes:
        """Serialize the object to json bytes."""
        return msgspec.json.Encoder(enc_hook=self.enc_hook).encode(self)

value = MeValue(aval=1, afile=1)

print(value.serialize())

I'd like it to return

b'{"aval":1, "afile": "AnObj(testfile, 1)"}'

on serialization. Currently I get

b'{"aval":1,"afile":1}'

A similar issue arises when trying to serialize dataclasses containing descriptors with the msgspec encoder, since the encoder takes a look at the dataclass object's __dict__, and not the fields returned by dataclasses.fields.

I understand that calling into python for accesses will impact performance and so maybe something this library is against doing. Let me know your thoughts. At the very least, maybe the docs can be updated with a note about descriptor behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions