Skip to content

aistairc/kat5

Repository files navigation

KAT5: Knowledge-Aware Text-to-Text Transformer Transformer

Knowledge-Aware Text-to-Text Transformer Transformer We introduce knowledge-aware transfer learning with a text-to-text transfer transformer (KAT5) by leveraging a text-to-text transfer transformer (T5) in the Wikipedia domain. In standard transfer learning like T5, a model is first pre-trained on an unsupervised data task with a language model objective before fine-tuning it on a downstream task. T5 explores several learning objectives, including masked language model (MLM), random span, and deshuffling, where the model is limited to exploring integrating knowledge during pre-training. Here, we push the limits of this model by grafting knowledge like entity and co-reference information by mapping Wikipedia and Wikidata during pre-training. We align large-scale alignments between Wikipedia abstract and Wiki-data triples to facilitate our pre-training KAT5 model. Our approach can match or outperform task-specific models while using the same architecture and hyper-parameters, in particular in entity and relation extraction (CoNLL04, ADE, and NYT datasets), and language generation tasks, including abstractive summarization (XSum, CNNDM), and machine translation.

Knowledge-aware Pre-training Data Creation Architecture

N|Solid

KAT5 Pre-training Architecture

N|Solid

KAT5 Fine-tuning Architecture

N|Solid

Scripts

Our scripts a-job-pretraining.sh, and a-job-finetuning.sh were mainly prepared for launching multi-node training and fine-tuning on the ABCI computation cluster.

Publications

Mohammad Golam Sohrab, Makoto Miwa (2024). KAT5: Knowledge-Aware Transfer Learning with a Text-to-Text Transfer Transformer. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14949. Springer, Cham. https://doi.org/10.1007/978-3-031-70378-2_10

@InProceedings{10.1007/978-3-031-70378-2_10,
author="Sohrab, Mohammad Golam
and Miwa, Makoto",
editor="Bifet, Albert
and Krilavi{\v{c}}ius, Tomas
and Miliou, Ioanna
and Nowaczyk, Slawomir",
title="KAT5: Knowledge-Aware Transfer Learning with a Text-to-Text Transfer Transformer",
booktitle="Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track",
year="2024",
publisher="Springer Nature Switzerland",
address="Cham",
pages="157--173",
abstract="We introduce knowledge-aware transfer learning with a text-to-text transfer transformer (KAT5) by leveraging a text-to-text transfer transformer (T5) in the Wikipedia domain. In standard transfer learning like T5, a model is first pre-trained on an unsupervised data task with a language model objective before fine-tuning it on a downstream task. T5 explores several learning objectives, including masked language model (MLM), random span, and deshuffling, where the model is limited to exploring integrating knowledge during pre-training. Here, we push the limits of this model by grafting knowledge like entity and co-reference information by mapping Wikipedia and Wikidata during pre-training. We align large-scale alignments between Wikipedia abstract and Wikidata triples to facilitate our pre-training KAT5 model. Our approach can match or outperform task-specific models while using the same architecture and hyper-parameters, in particular in entity and relation extraction (CoNLL04, ADE, and NYT datasets), and language generation tasks, including abstractive summarization (XSum, CNNDM), and machine translation. Our code is publicly released on GitHub (https://github.com/aistairc/kat5) under the Apache 2.0 License.",
isbn="978-3-031-70378-2"
}

Acknowledgment

This research is based on results obtained from a project JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

About

Knowledge-Aware Text-to-Text Transformer Transformer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published