Skip to content

Feature Engineering: Grammatical Features

Shakleen Ishfar edited this page Jun 9, 2024 · 1 revision

tldr; Levenshtein distance is the most useful.

Distance measures how different raw sentence from essay is to corrected sentence by T5 model. The model used is T5-GEC-small. The model corrects sentences from the essay. Then distance measures are used to measure how different the sentences are.

grammar_levenshtein_distance

Importance-wise Levenshtein distance is more effective than hamming, cosine, and jaccard. grammar_operation_overview

Operation-wise sum and max operations are more important than others. grammar_braod_importances

Clone this wiki locally