Releases · mdangschat/ctc-asr

21 Apr 12:16

mdangschat

v0.1.0

19d4ac5

Version 0.1.0 Latest

Latest

Excluded corpus creation from this project and moved it to its own (speech-corpus-dl)
Restructured the bucket boundary calculations
Updated documentation

Assets 2

10 Dec 11:03

mdangschat

v0.0.7

4c21c89

Version 0.0.7 Pre-release

Pre-release

Update to TensorFlow version 1.12.0+. Use the tf.data function.

The model now uses TensorFlow's current tf.data API.
The model now uses an Estimator.
Corpus
- The corpus metadata is now stored in CSV files.
- LibriSpeach loader can now use multiple threads.
- Improved corpus generation performance.
Fixed the GPU logging hook.
Major code refactoring.

Assets 2

12 Nov 09:06

mdangschat

v0.0.6

62a6150

Version 0.0.6 Pre-release

Pre-release

Added a downloader script that downloads and creates the used training corpus.
Unified python/model.py for the different DS1 and DS2 architectures.

Assets 2

02 Nov 16:28

mdangschat

v0.0.5

403afe1

Version 0.0.5 Pre-release

Pre-release

First Batch of Extended Goals

Removed overlong samples
Extended plotting (pyplot without display)
Added new dataset: Tatoeba
Rewritten large parts of the load dataset functionality
Added random seed
Unified relative paths based on params.BASE_PATH
Bug fixes

Assets 2

02 Nov 16:27

mdangschat

v0.0.4

b87ea57

Version 0.0.4 Pre-release

Pre-release

Performance Optimization and Evaluation

After every epoch, a validation run on the dev dataset is being run. #68
Fixed SortaGrad bug with switching from the first epoch. #108, #121
Increased number of buckets, to reduce the amount of padding used. #106
Storing more checkpoints, also spaced out the steps at which checkpoints are written. #122
Combined [DS1] style model into the modified [DS2] style one. Switching between them can be done via the params.py options. #119, #110
Added threading to most /loader/ functions. #112
Optimized loader methods and datasets further. #112, #99, #113
See Milestone v0.0.4 for further information's.

Assets 2

02 Nov 16:27

mdangschat

v0.0.3

0e3aefd

Version 0.0.3 Pre-release

Pre-release

Deep Speech 2 Inspired Features

Added SortaGrad ordering of training data for the first epoch. [Deep Speech 2]
Replaced the first 3 dense layers with 2D convolutions over time and frequency [DS2]
Added support for cuDNN RNN cell usage.
Datasets
- Added Mozilla Common Voice data.
- Removed audio files with only 1 character long labels.
- Removed files from TEDLIUM that had labels shorter than 5 words.
- Removed audio files that had feature vectors longer than 1750.
Added documentation about tensor shapes back in.
Changed the input features to an 80 element logarithmic filter bank, with an option to switch to 80 element MFCC features.
See Milestone v0.0.3 for further information.

Assets 2

02 Nov 16:26

mdangschat

v0.0.2

be367c7

Version 0.0.2 Pre-release

Pre-release

Improvements & Preprocessing

Added Dropout functionality.
Evaluation can now be run on the validation or test dataset.
Preprocessing and input optimization:
- Using python_speech_features to load audio, instead of the old librosa, which had problems running in parallel.
- Added three new feature normalization methods.
- Skipped every 2nd input feature frame (on the time axis).
- Audiofiles can now be normalized by power level.
Added support for Baidu's WarpCTC, which should be faster.
- WarpCTC usage is bugged in evaluation runs.
Updated project README.md.
Updated generate_txt.py for new datasets (TEDLIUMv2, LibriSpeech, TIMIT).
For more details please reference the Milestone v0.0.2

Assets 2

02 Nov 16:24

mdangschat

v0.0.1

51433a3

Version 0.0.1 Pre-release

Pre-release

First Prototype

DeepSpeech (v1)

Trains on LibriSpeech ASR Coprus.
Uses GRUCells in the bidirectional layer.
Layout: 3 dense, 1 bidirectional RNN, 1 dense, 1 dense/logits layers.
Evaluation isn't polished.
No single input inference implemented.
See Milestone v0.0.1 for further infos.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

First Batch of Extended Goals

Uh oh!

Performance Optimization and Evaluation

Uh oh!

Deep Speech 2 Inspired Features

Uh oh!

Improvements & Preprocessing

Uh oh!

First Prototype

Uh oh!

Releases: mdangschat/ctc-asr

Version 0.1.0

Uh oh!

Version 0.0.7

Uh oh!

Version 0.0.6

Uh oh!

Version 0.0.5

First Batch of Extended Goals

Uh oh!

Version 0.0.4

Performance Optimization and Evaluation

Uh oh!

Version 0.0.3

Deep Speech 2 Inspired Features

Uh oh!

Version 0.0.2

Improvements & Preprocessing

Uh oh!

Version 0.0.1

First Prototype

Uh oh!