Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

This is the official repo for our paper Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams in Speech Synthesis Workshop 2025 (SSW13) at Leeuwarden, the Netherlands.

Setup your environment

Create a conda environment via:

conda env create -f environment.yaml

Data

Prepare the data in a Kaldi's wav.scp format.
Use a pre-trained Kaldi HMM-DNN model to extract PPGs from speech. Kaldi docs are helpful to do that.
Extract Speaker Embedding using Wespeaker cli. Specifically, you should use wespeaker --task embedding_kaldi --wav_scp YOUR_WAV.scp --output_file /path/to/embedding, see here.
Extract Pitch and Periodicity using ppg_tts/feature_extract/penn_log_f0_extract.py.

Pretrained Checkpoint:

Pretrained checkpoint is available here.

Pretrained HiFi-GAN generator checkpoint is avaible here. Please put the HiFi-GAN checkpoint under vocoder/hifigan/ckpt.

Training

python -m ppg_tts.main fit -c config/fit_ppgmatcha.yaml -c config/data_template.yaml

You can overwrite the arguments via CLI, see pytorch-lightning docs.

Evaluation

Inference

Currently we don't have specific script for inferencing a pre-trained models with minimum efforts, but inference can be done via executing certain stages in the evaluation script. Specialized inference script will be available in the future.

To do inference, data should be prepared the same as the training data (see here).

Inference for TTS

Follow the comment in ppg_tts/evaluation/evaluate_switch_speaker.sh and set start=0 and end=1 to do TTS inference.

Inference for Phoneme-level Editing

Follow the comment in ppg_tts/evaluation/evaluate_editing/evaluate_editing.sh and set start=0 and end=1 to do editing inference.

Citation

Coming soon.

License

Our work is shared under Creative Commons Attribution 4.0 International (CC-BY-4.0)

Name		Name	Last commit message	Last commit date
Latest commit History 327 Commits
config		config
ppg_tts		ppg_tts
scripts		scripts
tests		tests
vocoder		vocoder
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

Setup your environment

Data

Pretrained Checkpoint:

Training

Evaluation

Copy-synthesis/reconstruct-synthesis Evaluation

Cross-speaker Evaluation

Phoneme-level Editing

Inference

Inference for TTS

Inference for Phoneme-level Editing

Citation

License

About

Uh oh!

Releases

Packages

Languages

aalto-speech/PPG2Speech

Folders and files

Latest commit

History

Repository files navigation

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

Setup your environment

Data

Pretrained Checkpoint:

Training

Evaluation

Copy-synthesis/reconstruct-synthesis Evaluation

Cross-speaker Evaluation

Phoneme-level Editing

Inference

Inference for TTS

Inference for Phoneme-level Editing

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages