Skip to content

Is it possible to access terminals for unparsed text? #58

@abigailalice

Description

@abigailalice

I'm trying to render the original sentences I use Greynir to parse verbatim as html, while inserting the additional data (e.g. lemmas, parts of speech) into the html as well. However, it's not clear if all of the original data is recoverable from the results of Greynir, for instance, if I have a sentence with multiple spaces and use tidy_text they get reduced to a single one, and using terminals doesn't show spaces at all.

For comparison, spacy lets you recover the input text from its output. Is there a way to do this, so I can iterate over terminals or unparsed text together? I did see that periods are stored as a terminal, with no category, so presumably raw terminals could be stored the same way, but I'm assuming from tidy_texts behaviour the data might not be stored at all.

I am looking at trying to insert missing context back into the results as a workaround, I'm just curious if there's any methods/attributes that get me the info I need already.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions