Skip to content

Add a new parser API built on attrs for defining classes instantiated from scanned stanzas #52

@jwodder

Description

@jwodder

A parser will be defined via a class decorated with @parsable. Header fields will be mapped to attributes of the class, with non-trivial mappings defined via field declarations of the form fieldname: Annotation = Field(...).

  • Alternative idea: Replace Field with typing.Annotated à la Pydantic 2.0.

  • Field constructs an attr.Attribute with headerparser-specific parameters stored in the attribute metadata under a "headerparser" key

  • @parsable compiles the class's parsing metadata into a ParserSpec instance that is then saved as a class variable, which is then used by the actual parse*() functions.

  • @parsable can be passed the following arguments:

    • name_decoder — what the v1 parser calls the "normalizer"; defaults to lambda s: re.sub(r'[^\w_]', "_", s.lower())
    • scanner_options: dict[str, Any]
    • **kwargs — passed to attr.define
  • Field — For defining nontrivial multiple=False fields

    • Takes the following arguments:
      • alias
      • decoder — A callable that takes a header name (str) and a value
        • For fields with aliases, this is passed the actual field name, not the alias, as that's what pydantic does with validators.
      • **kwargs — passed to attr.field
  • MultiField: For defining multiple=True fields

    • Takes the same arguments as Field, except that decoder is passed a header name and a list of values
  • ExtraFields: For defining an attribute to store additional fields with multiple=False on

    • Takes the following arguments:
      • decoder — a callable that is passed a list of (name, value) pairs with unique names
      • **kwargs — passed to attr.field
    • Extra fields are allowed in the parsed input iff this or MultiExtraFields is present
    • A class cannot have more than one ExtraFields or MultiExtraFields
  • MultiExtraFields: For defining an attribute to store additional fields with multiple=True on

    • Takes the following arguments:
      • decoder — a callable that is passed a list of (name, value) pairs in which the names need not be unique
      • **kwargs — passed to attr.field
  • BodyField: For defining the attribute on which the body will be stored

    • Takes the following arguments:
      • decoder — a callable that takes just a value
      • **kwargs — passed to attr.field
    • A body is allowed iff such a BodyField is present in the class
    • A class cannot have more than one BodyField
  • Functions:

    • parse(klass: Type[L], data: Union[Iterable[str], str, Scanner]) -> L
    • parse_stanzas(klass: Type[L], data: Union[Iterable[str], str, Scanner]) -> Iterator[L]
    • parse_stream(klass: Type[L], fields: Iterable[Tuple[Optional[str], str]]) -> L
      • There's no point in trying to merge this and parse_stanzas_stream() into the non-stream versions, as either way this function or an equivalent will be needed for the others to call
    • parse_stanzas_stream(klass: Type[L], fields: Iterable[Iterable[Tuple[str, str]]]) -> Iterator[L]
    • There is no parse_next_stanza(); to get this effect, the user should scan the stanza themselves using Scanner and pass the results to parse_stream()
      • Or should parse_next_stanza() exist but only take a Scanner?
    • make_parsable(…) — wraps attr.make_class()
    • is_parsable(Any) -> bool
    • Something (get_scanner()?) for taking a parsable and returning a Scanner initialized with its scanner options?
      • The function would also need to take the data to initialize the Scanner with — unless I give Scanner a feed() method
  • There is a ParserMixin(?) mixin class that implements equivalents of all of the parse*() functions as classmethods that get the klass from cls

  • Supply a premade set of decoders for parsing bools, timestamps, etc.?

  • Supply higher-order functions for converting single-argument functions to (name, value) decoders, converting (name, value) decoders to (name, [value]) decoders, and converting single-argument functions to (name, [value]) decoders

  • Supply one or more equivalents of attrs' pipe() et alii?

  • Add an option for just discarding all extra/unknown fields?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions