Skip to content

Transformers on Meta's emg2qwerty surface electromyographic (sEMG) touch typing dataset. Beats TDSConv baselines for personalized model

License

Notifications You must be signed in to change notification settings

LLeon360/emg2qwerty

Repository files navigation

C147/247 Final Project

Rathul Anand

Leon Liu

Abstract

This project investigates the effectiveness of various sequence processing architectures, specifically Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs), Long Short-Term Memory networks (LSTMs), and Transformers, in decoding surface electromyography (sEMG) signals into corresponding key press sequences. These signals are collected from wearable wrist sensors during touch typing on a QWERTY keyboard. By leveraging the emg2qwerty dataset, we aim to determine which sequence processing architecture offers superior performance in decoding sEMG signals for accurate text input recognition when trained on a single user's data from random initialization. Our findings show that transformer and recurrent architectures with more flexible receptive fields significantly outperform the baseline Time-Depth Separable ConvNet trained, addressing the hypothesis that fixed receptive fields are necessary for this task. Additionally, we demonstrate that log spectrograms can be down-sampled more aggressively (up to 50Hz from the baseline 125Hz) without significant performance degradation, which helps offset the computational costs associated with these more powerful sequence processing architectures and offers efficiency improvements upon the previous baseline. We additionally explore the impacts of training sequence length on generalization, revealing limitations on generalization of transformer models to longer sequences for limited computational training and single user data.

Winter 2025 - Professor Jonathan Kao

This course project is built upon the emg2qwerty work from Meta. The first section of this README provides some guidance for working with the repo and contains a running list of FAQs. Note that the rest of the README is from the original repo and we encourage you to take a look at their work.

Guiding Tips + FAQs

Last updated 2/13/2025

  • Read through the Project Guidelines to ensure that you have a clear understanding of what we expect
  • Familiarize yourself with the prediction task and get a high-level understanding of their base architecture (it would be beneficial to read about CTC loss)
  • Get comfortable with the codebase
    • lightning.py + modules.py - where most of your model architecture development will take place
    • data.py - defines PyTorch dataset (likely will not need to touch this much)
    • transforms.py - implement more data transforms and other preprocessing techniques
    • config/*.yaml - modify model hyperparameters and PyTorch Lightning training configuration
      • Q: How do we update these configuration files? A: Note the structure of YAML files include basic key-value pairs (i.e. <key>: <value>) and hierarchical structure. So, for instance, if we wanted to update the mlp_features hyperparameter of the TDSConvCTCModule, we would change the value at line 5 of config/model/tds_conv_ctc.yaml (under module). Read more details here.
      • Q: Where do we configure data splitting? A: Refer to config/user/single_user.yaml. Be careful with your edits, so that you don't accidentally move the test data into your training set.

emg2qwerty

[ Paper ] [ Dataset ] [ Blog ] [ BibTeX ]

A dataset of surface electromyography (sEMG) recordings while touch typing on a QWERTY keyboard with ground-truth, benchmarks and baselines.

alt=

WandB Setup

Include +exp_name=" argument for experiment name, used in base.yaml to initialize the experiment name with WandbLogger. Update the entity and project directly in the config appropriately as well.

CUDA_VISIBLE_DEVICES=2 python -m emg2qwerty.train user="single_user" trainer.accelerator=gpu trainer.devices=1 +exp_name="encoder_small" model="transformer_encoder_ctc_small" > logs/stdout3.log 2>&1 

Eval (change base.yaml to train: False and load ckpt)

CUDA_VISIBLE_DEVICES=0 python -m emg2qwerty.train user="single_user" trainer.accelerator=gpu trainer.devices=1 +exp_name="rot_encoder_small" model="roformer_encoder_ctc_small" train=True > logs/eval1.log 2>&1 

Train

CUDA_VISIBLE_DEVICES=0 python -m emg2qwerty.train user="single_user" trainer.accelerator=gpu trainer.devices=1 +exp_name="rot_encoder_small" model="roformer_encoder_ctc_small" train=True > logs/rot1.log 2>&1

We also provide shell scripts for some of our experimental configurations, note that if you intend on using eval.sh to evaluate models trained with different data preprocessing (window size / hop length), you should modify eval.sh to load the eval datasets with the corresponding data preprocessing configured as well.

Setup

# Install [git-lfs](https://git-lfs.github.com/) (for pretrained checkpoints)
git lfs install

# Clone the repo, setup environment, and install local package
git clone git@github.com:facebookresearch/emg2qwerty.git ~/emg2qwerty
cd ~/emg2qwerty
conda env create -f environment.yml
conda activate emg2qwerty
pip install -e .

# Download the dataset, extract, and symlink to ~/emg2qwerty/data
cd ~ && wget https://fb-ctrl-oss.s3.amazonaws.com/emg2qwerty/emg2qwerty-data-2021-08.tar.gz
tar -xvzf emg2qwerty-data-2021-08.tar.gz
ln -s ~/emg2qwerty-data-2021-08 ~/emg2qwerty/data

Data

The dataset consists of 1,136 files in total - 1,135 session files spanning 108 users and 346 hours of recording, and one metadata.csv file. Each session file is in a simple HDF5 format and includes the left and right sEMG signal data, prompted text, keylogger ground-truth, and their corresponding timestamps. emg2qwerty.data.EMGSessionData offers a programmatic read-only interface into the HDF5 session files.

To load the metadata.csv file and print dataset statistics,

python scripts/print_dataset_stats.py

Dataset statistics

To re-generate data splits,

python scripts/generate_splits.py

The following figure visualizes the dataset splits for training, validation and testing of generic and personalized user models. Refer to the paper for details of the benchmark setup and data splits.

Data splits

To re-format data in EEG BIDS format,

python scripts/convert_to_bids.py

Training

Generic user model:

python -m emg2qwerty.train \
  user=generic \
  trainer.accelerator=gpu trainer.devices=8 \
  --multirun

Personalized user models:

python -m emg2qwerty.train \
  user="single_user" \
  trainer.accelerator=gpu trainer.devices=1

If you are using a Slurm cluster, include "cluster=slurm" override in the argument list of above commands to pick up config/cluster/slurm.yaml. This overrides the Hydra Launcher to use Submitit plugin. Refer to Hydra documentation for the list of available launcher plugins if you are not using a Slurm cluster.

Testing

Greedy decoding:

python -m emg2qwerty.train \
  user="glob(user*)" \
  checkpoint="${HOME}/emg2qwerty/models/personalized-finetuned/\${user}.ckpt" \
  train=False trainer.accelerator=cpu \
  decoder=ctc_greedy \
  hydra.launcher.mem_gb=64 \
  --multirun

Beam-search decoding with 6-gram character-level language model:

python -m emg2qwerty.train \
  user="glob(user*)" \
  checkpoint="${HOME}/emg2qwerty/models/personalized-finetuned/\${user}.ckpt" \
  train=False trainer.accelerator=cpu \
  decoder=ctc_beam \
  hydra.launcher.mem_gb=64 \
  --multirun

The 6-gram character-level language model, used by the first-pass beam-search decoder above, is generated from WikiText-103 raw dataset, and built using KenLM. The LM is available under models/lm/, both in the binary format, and the human-readable ARPA format. These can be regenerated as follows:

  1. Build kenlm from source: https://github.com/kpu/kenlm#compiling
  2. Run ./scripts/lm/build_char_lm.sh <ngram_order>

License

emg2qwerty is CC-BY-NC-4.0 licensed, as found in the LICENSE file.

Citing emg2qwerty

@misc{sivakumar2024emg2qwertylargedatasetbaselines,
      title={emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography},
      author={Viswanath Sivakumar and Jeffrey Seely and Alan Du and Sean R Bittner and Adam Berenzweig and Anuoluwapo Bolarinwa and Alexandre Gramfort and Michael I Mandel},
      year={2024},
      eprint={2410.20081},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.20081},
}

About

Transformers on Meta's emg2qwerty surface electromyographic (sEMG) touch typing dataset. Beats TDSConv baselines for personalized model

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 5