Knowledge-Aware Text-to-Text Transformer Transformer We introduce knowledge-aware transfer learning with a text-to-text transfer transformer (KAT5) by leveraging a text-to-text transfer transformer (T5) in the Wikipedia domain. In standard transfer learning like T5, a model is first pre-trained on an unsupervised data task with a language model objective before fine-tuning it on a downstream task. T5 explores several learning objectives, including masked language model (MLM), random span, and deshuffling, where the model is limited to exploring integrating knowledge during pre-training. Here, we push the limits of this model by grafting knowledge like entity and co-reference information by mapping Wikipedia and Wikidata during pre-training. We align large-scale alignments between Wikipedia abstract and Wiki-data triples to facilitate our pre-training KAT5 model. Our approach can match or outperform task-specific models while using the same architecture and hyper-parameters, in particular in entity and relation extraction (CoNLL04, ADE, and NYT datasets), and language generation tasks, including abstractive summarization (XSum, CNNDM), and machine translation.
Our scripts a-job-pretraining.sh
, and a-job-finetuning.sh
were mainly prepared for launching multi-node training and fine-tuning on the ABCI computation cluster.
Mohammad Golam Sohrab, Makoto Miwa (2024). KAT5: Knowledge-Aware Transfer Learning with a Text-to-Text Transfer Transformer. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14949. Springer, Cham. https://doi.org/10.1007/978-3-031-70378-2_10
@InProceedings{10.1007/978-3-031-70378-2_10,
author="Sohrab, Mohammad Golam
and Miwa, Makoto",
editor="Bifet, Albert
and Krilavi{\v{c}}ius, Tomas
and Miliou, Ioanna
and Nowaczyk, Slawomir",
title="KAT5: Knowledge-Aware Transfer Learning with a Text-to-Text Transfer Transformer",
booktitle="Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track",
year="2024",
publisher="Springer Nature Switzerland",
address="Cham",
pages="157--173",
abstract="We introduce knowledge-aware transfer learning with a text-to-text transfer transformer (KAT5) by leveraging a text-to-text transfer transformer (T5) in the Wikipedia domain. In standard transfer learning like T5, a model is first pre-trained on an unsupervised data task with a language model objective before fine-tuning it on a downstream task. T5 explores several learning objectives, including masked language model (MLM), random span, and deshuffling, where the model is limited to exploring integrating knowledge during pre-training. Here, we push the limits of this model by grafting knowledge like entity and co-reference information by mapping Wikipedia and Wikidata during pre-training. We align large-scale alignments between Wikipedia abstract and Wikidata triples to facilitate our pre-training KAT5 model. Our approach can match or outperform task-specific models while using the same architecture and hyper-parameters, in particular in entity and relation extraction (CoNLL04, ADE, and NYT datasets), and language generation tasks, including abstractive summarization (XSum, CNNDM), and machine translation. Our code is publicly released on GitHub (https://github.com/aistairc/kat5) under the Apache 2.0 License.",
isbn="978-3-031-70378-2"
}
This research is based on results obtained from a project JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).