Skip to content

Releases: mdangschat/ctc-asr

Version 0.1.0

21 Apr 12:16
19d4ac5
Compare
Choose a tag to compare
  • Excluded corpus creation from this project and moved it to its own (speech-corpus-dl)
  • Restructured the bucket boundary calculations
  • Updated documentation

Version 0.0.7

10 Dec 11:03
4c21c89
Compare
Choose a tag to compare
Version 0.0.7 Pre-release
Pre-release

Update to TensorFlow version 1.12.0+. Use the tf.data function.

  • The model now uses TensorFlow's current tf.data API.
  • The model now uses an Estimator.
  • Corpus
    • The corpus metadata is now stored in CSV files.
    • LibriSpeach loader can now use multiple threads.
    • Improved corpus generation performance.
  • Fixed the GPU logging hook.
  • Major code refactoring.

Version 0.0.6

12 Nov 09:06
62a6150
Compare
Choose a tag to compare
Version 0.0.6 Pre-release
Pre-release
  • Added a downloader script that downloads and creates the used training corpus.
  • Unified python/model.py for the different DS1 and DS2 architectures.

Version 0.0.5

02 Nov 16:28
Compare
Choose a tag to compare
Version 0.0.5 Pre-release
Pre-release

First Batch of Extended Goals

  • Removed overlong samples
  • Extended plotting (pyplot without display)
  • Added new dataset: Tatoeba
  • Rewritten large parts of the load dataset functionality
  • Added random seed
  • Unified relative paths based on params.BASE_PATH
  • Bug fixes

Version 0.0.4

02 Nov 16:27
Compare
Choose a tag to compare
Version 0.0.4 Pre-release
Pre-release

Performance Optimization and Evaluation

  • After every epoch, a validation run on the dev dataset is being run. #68
  • Fixed SortaGrad bug with switching from the first epoch. #108, #121
  • Increased number of buckets, to reduce the amount of padding used. #106
  • Storing more checkpoints, also spaced out the steps at which checkpoints are written. #122
  • Combined [DS1] style model into the modified [DS2] style one. Switching between them can be done via the params.py options. #119, #110
  • Added threading to most /loader/ functions. #112
  • Optimized loader methods and datasets further. #112, #99, #113
  • See Milestone v0.0.4 for further information's.

Version 0.0.3

02 Nov 16:27
Compare
Choose a tag to compare
Version 0.0.3 Pre-release
Pre-release

Deep Speech 2 Inspired Features

  • Added SortaGrad ordering of training data for the first epoch. [Deep Speech 2]
  • Replaced the first 3 dense layers with 2D convolutions over time and frequency [DS2]
  • Added support for cuDNN RNN cell usage.
  • Datasets
    • Added Mozilla Common Voice data.
    • Removed audio files with only 1 character long labels.
    • Removed files from TEDLIUM that had labels shorter than 5 words.
    • Removed audio files that had feature vectors longer than 1750.
  • Added documentation about tensor shapes back in.
  • Changed the input features to an 80 element logarithmic filter bank, with an option to switch to 80 element MFCC features.
  • See Milestone v0.0.3 for further information.

Version 0.0.2

02 Nov 16:26
Compare
Choose a tag to compare
Version 0.0.2 Pre-release
Pre-release

Improvements & Preprocessing

  • Added Dropout functionality.
  • Evaluation can now be run on the validation or test dataset.
  • Preprocessing and input optimization:
    • Using python_speech_features to load audio, instead of the old librosa, which had problems running in parallel.
    • Added three new feature normalization methods.
    • Skipped every 2nd input feature frame (on the time axis).
    • Audiofiles can now be normalized by power level.
  • Added support for Baidu's WarpCTC, which should be faster.
    • WarpCTC usage is bugged in evaluation runs.
  • Updated project README.md.
  • Updated generate_txt.py for new datasets (TEDLIUMv2, LibriSpeech, TIMIT).
  • For more details please reference the Milestone v0.0.2

Version 0.0.1

02 Nov 16:24
Compare
Choose a tag to compare
Version 0.0.1 Pre-release
Pre-release

First Prototype

DeepSpeech (v1)

  • Trains on LibriSpeech ASR Coprus.
  • Uses GRUCells in the bidirectional layer.
  • Layout: 3 dense, 1 bidirectional RNN, 1 dense, 1 dense/logits layers.
  • Evaluation isn't polished.
  • No single input inference implemented.
  • See Milestone v0.0.1 for further infos.