LLM_NER_MultiNERD

🤗 MultiNERD Dataset | 🛠️ bert-base-cased model | 🛠️ xlnet-base-cased model | 📄 Paper for Dataset |

LLM_NER_MultiNERD is a Using the MultiNERD Named Entity Recognition (NER) dataset, complete the following instructions to train and evaluate a Named Entity Recognition model for English using BERT and XLNET. Built on top of the familiar 🤗 Transformers library.

Instructions:

System A

Fine-tune chosen model bert-base-cased and xlnet-base-cased on the English subset of the training set.

System B

Train a model that will predict only five entity types and the O tag (I.e. not part of an entity). Therefore, the necessary pre-processing steps should be performed on the dataset. All examples should thus remain, but entity types not belonging to one of the following five should be set to zero: PERSON(PER), ORGANIZATION(ORG), LOCATION(LOC), DISEASES(DIS), ANIMAL(ANIM). Fine-tune the chosen models on the filtered dataset.

BERT

BERT (Bidirectional Encoder Representations from Transformers) employs a bidirectional attention mechanism to capture contextual information from both left and right contexts. It uses pre-training tasks, such as masked language modeling, to learn contextualized embeddings.

XLNet

XLNet improves upon BERT by introducing permutation language modeling. It captures bidirectional context like BERT but allows for a more flexible information flow. In Named Entity Recognition (NER) tasks, these models excel at understanding the relationships between words and recognizing entities such as persons, organizations, and locations. Their deep contextual embeddings enable them to capture nuanced patterns, improving accuracy in identifying named entities within text.

Setting up the Docker environment and installing the dependencies

Go to folder docker/.

docker build -f Dockerfile -t NER-MultiNERD \
--build-arg username=$(username) .

docker run -it --shm-size 60G --gpus all \
-v /path/to/dir/:/home/username/NER-MultiNERD/ \
-v /path/to/storage/:/storage/ NER-MultiNERD

You can install the following dependencies to run tasks in the environment:

pip install -r requirements.txt

Input Format

The BIOS tag scheme of the input, with each character its label for one line. Sentences are split with a null line.

Run the code for finetuning

The fine-tune BERT for System A:

python main_A.py --MODEL_CKPT bert-base-cased

The fine-tune BERT for System B:

python main_B.py --MODEL_CKPT bert-base-cased

The fine-tune XLNET for System A:

python main_A.py --MODEL_CKPT xlnet-base-cased

The fine-tune XLNET for System B:

python main_B.py --MODEL_CKPT xlnet-base-cased

Load Fine-tuned Models directly from Hugging Face 🤗

I have uploaded the fine-tuned models to Hugging Face, you can load or inference them with API directly, here is an example.

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("medxiaorudan/bert-base-cased-finetuned-MultiNERD-SystemA")
model = AutoModelForTokenClassification.from_pretrained("medxiaorudan/bert-base-cased-finetuned-MultiNERD-SystemA")

Evaluation results

The overall performance of BERT and XLNET on dev (The more detailed results about validation and visualization can be found in Notebooks:

	Accuracy (entity)	Recall (entity)	Precision (entity)	F1 score (entity)
BERT+SystemA	0.9861	0.9685	0.8699	0.9165
BERT+SystemB	0.9922	0.9740	0.9206	0.9466
XLNET+SystemA	0.9759	0.9548	0.7967	0.8687
XLNET+SystemB	0.9915	0.9741	0.9145	0.9434

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Notebooks		Notebooks
Outdir		Outdir
docker		docker
images		images
Data_Preprocessing.py		Data_Preprocessing.py
README.md		README.md
main_A.py		main_A.py
main_B.py		main_B.py
requirements.txt		requirements.txt
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM_NER_MultiNERD

Instructions:

System A

System B

BERT

XLNet

Setting up the Docker environment and installing the dependencies

Input Format

Run the code for finetuning

Load Fine-tuned Models directly from Hugging Face 🤗

Evaluation results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

medxiaorudan/LLM_NER_MultiNERD

Folders and files

Latest commit

History

Repository files navigation

LLM_NER_MultiNERD

Instructions:

System A

System B

BERT

XLNet

Setting up the Docker environment and installing the dependencies

Input Format

Run the code for finetuning

Load Fine-tuned Models directly from Hugging Face 🤗

Evaluation results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages