Skip to content

Data Version History

Shakleen Ishfar edited this page May 21, 2024 · 5 revisions

Data Versions

Version Data Source Folds Split Strategy Negative Sampling Sliding Window Max Length Processing
1 Competition 7 StratefiedGroupKFold No No 512 Strip extra space
2 Competition 7 StratefiedGroupKFold No Yes 512 New line to space and multiple space to single space
3 Competition 7 StratefiedGroupKFold Yes Yes 512 Same as 2
4 Competition + Persuade 2.0 7 StratefiedGroupKFold No Yes 512 Same as 2
5 Competition + Persuade 2.0 7 StratefiedGroupKFold Yes Yes 512 Same as 2

Tokenizer Versions

Version 1

Base tokenizer with no added tokens.

Version 2

Base tokenizer with two added tokens

  1. New line token (\n)
  2. Double space token ( )

Model Version

Version 1

DeBERTA-V3 model with Mean pooling for classification

Clone this wiki locally