Skip to content

Commit cb96738

Browse files
committed
release 1.1.1
1 parent 6fb3fdd commit cb96738

File tree

11 files changed

+287
-883
lines changed

11 files changed

+287
-883
lines changed

docs/README.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ Malaya-Speech also released pretrained models, simply check at `malaya-speech/pr
9494
- **FastSpeechSplit**, Unsupervised Speech Decomposition Via Triple Information Bottleneck using Transformer, no paper produced.
9595
- **Sepformer**, Attention is All You Need in Speech Separation, https://arxiv.org/abs/2010.13154
9696
- **FastSpeechSplit**, Faster and Accurate Speech Split Conversion using Transformer, no paper produced.
97+
- **HuBERT**, Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, https://arxiv.org/pdf/2106.07447v1.pdf
9798

9899
References
99100
-----------

docs/load-stt-transducer-model-mixed.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,8 @@
145145
" <th>large-conformer-mixed</th>\n",
146146
" <td>404</td>\n",
147147
" <td>107</td>\n",
148-
" <td>0.25903</td>\n",
149-
" <td>0.17893</td>\n",
148+
" <td>0.24829</td>\n",
149+
" <td>0.16606</td>\n",
150150
" <td>[malay, singlish]</td>\n",
151151
" </tr>\n",
152152
" </tbody>\n",
@@ -160,7 +160,7 @@
160160
"large-conformer 404 107 0.15986 0.05937 \n",
161161
"alconformer 38.1 15.1 0.20703 0.08533 \n",
162162
"conformer-mixed 125 37.1 0.25314 0.15836 \n",
163-
"large-conformer-mixed 404 107 0.25903 0.17893 \n",
163+
"large-conformer-mixed 404 107 0.24829 0.16606 \n",
164164
"\n",
165165
" Language \n",
166166
"small-conformer [malay] \n",
@@ -184,7 +184,7 @@
184184
"cell_type": "markdown",
185185
"metadata": {},
186186
"source": [
187-
"Lower is better. `mixed` and `bahasa` models tested on different test set."
187+
"Lower is better. Mixed models tested on different dataset."
188188
]
189189
},
190190
{

docs/load-stt-transducer-model.ipynb

Lines changed: 3 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
},
3939
{
4040
"cell_type": "code",
41-
"execution_count": 1,
41+
"execution_count": null,
4242
"metadata": {},
4343
"outputs": [],
4444
"source": [
@@ -56,115 +56,11 @@
5656
},
5757
{
5858
"cell_type": "code",
59-
"execution_count": 2,
59+
"execution_count": null,
6060
"metadata": {
6161
"scrolled": false
6262
},
63-
"outputs": [
64-
{
65-
"data": {
66-
"text/html": [
67-
"<div>\n",
68-
"<style scoped>\n",
69-
" .dataframe tbody tr th:only-of-type {\n",
70-
" vertical-align: middle;\n",
71-
" }\n",
72-
"\n",
73-
" .dataframe tbody tr th {\n",
74-
" vertical-align: top;\n",
75-
" }\n",
76-
"\n",
77-
" .dataframe thead th {\n",
78-
" text-align: right;\n",
79-
" }\n",
80-
"</style>\n",
81-
"<table border=\"1\" class=\"dataframe\">\n",
82-
" <thead>\n",
83-
" <tr style=\"text-align: right;\">\n",
84-
" <th></th>\n",
85-
" <th>Size (MB)</th>\n",
86-
" <th>Quantized Size (MB)</th>\n",
87-
" <th>WER</th>\n",
88-
" <th>CER</th>\n",
89-
" <th>Language</th>\n",
90-
" </tr>\n",
91-
" </thead>\n",
92-
" <tbody>\n",
93-
" <tr>\n",
94-
" <th>small-conformer</th>\n",
95-
" <td>49.2</td>\n",
96-
" <td>18.1</td>\n",
97-
" <td>0.20599</td>\n",
98-
" <td>0.08933</td>\n",
99-
" <td>[malay]</td>\n",
100-
" </tr>\n",
101-
" <tr>\n",
102-
" <th>conformer</th>\n",
103-
" <td>125</td>\n",
104-
" <td>37.1</td>\n",
105-
" <td>0.16547</td>\n",
106-
" <td>0.0641</td>\n",
107-
" <td>[malay]</td>\n",
108-
" </tr>\n",
109-
" <tr>\n",
110-
" <th>large-conformer</th>\n",
111-
" <td>404</td>\n",
112-
" <td>107</td>\n",
113-
" <td>0.15986</td>\n",
114-
" <td>0.05937</td>\n",
115-
" <td>[malay]</td>\n",
116-
" </tr>\n",
117-
" <tr>\n",
118-
" <th>alconformer</th>\n",
119-
" <td>38.1</td>\n",
120-
" <td>15.1</td>\n",
121-
" <td>0.20703</td>\n",
122-
" <td>0.08533</td>\n",
123-
" <td>[malay]</td>\n",
124-
" </tr>\n",
125-
" <tr>\n",
126-
" <th>conformer-mixed</th>\n",
127-
" <td>125</td>\n",
128-
" <td>37.1</td>\n",
129-
" <td>0.35191</td>\n",
130-
" <td>0.23667</td>\n",
131-
" <td>[malay, singlish]</td>\n",
132-
" </tr>\n",
133-
" <tr>\n",
134-
" <th>large-conformer-mixed</th>\n",
135-
" <td>404</td>\n",
136-
" <td>107</td>\n",
137-
" <td>0.3359</td>\n",
138-
" <td>0.1989</td>\n",
139-
" <td>[malay, singlish]</td>\n",
140-
" </tr>\n",
141-
" </tbody>\n",
142-
"</table>\n",
143-
"</div>"
144-
],
145-
"text/plain": [
146-
" Size (MB) Quantized Size (MB) WER CER \\\n",
147-
"small-conformer 49.2 18.1 0.20599 0.08933 \n",
148-
"conformer 125 37.1 0.16547 0.0641 \n",
149-
"large-conformer 404 107 0.15986 0.05937 \n",
150-
"alconformer 38.1 15.1 0.20703 0.08533 \n",
151-
"conformer-mixed 125 37.1 0.35191 0.23667 \n",
152-
"large-conformer-mixed 404 107 0.3359 0.1989 \n",
153-
"\n",
154-
" Language \n",
155-
"small-conformer [malay] \n",
156-
"conformer [malay] \n",
157-
"large-conformer [malay] \n",
158-
"alconformer [malay] \n",
159-
"conformer-mixed [malay, singlish] \n",
160-
"large-conformer-mixed [malay, singlish] "
161-
]
162-
},
163-
"execution_count": 2,
164-
"metadata": {},
165-
"output_type": "execute_result"
166-
}
167-
],
63+
"outputs": [],
16864
"source": [
16965
"malaya_speech.stt.available_transducer()"
17066
]

example/stt-transducer-model-mixed/load-stt-transducer-model-mixed.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,8 @@
145145
" <th>large-conformer-mixed</th>\n",
146146
" <td>404</td>\n",
147147
" <td>107</td>\n",
148-
" <td>0.25903</td>\n",
149-
" <td>0.17893</td>\n",
148+
" <td>0.24829</td>\n",
149+
" <td>0.16606</td>\n",
150150
" <td>[malay, singlish]</td>\n",
151151
" </tr>\n",
152152
" </tbody>\n",
@@ -160,7 +160,7 @@
160160
"large-conformer 404 107 0.15986 0.05937 \n",
161161
"alconformer 38.1 15.1 0.20703 0.08533 \n",
162162
"conformer-mixed 125 37.1 0.25314 0.15836 \n",
163-
"large-conformer-mixed 404 107 0.25903 0.17893 \n",
163+
"large-conformer-mixed 404 107 0.24829 0.16606 \n",
164164
"\n",
165165
" Language \n",
166166
"small-conformer [malay] \n",
@@ -184,7 +184,7 @@
184184
"cell_type": "markdown",
185185
"metadata": {},
186186
"source": [
187-
"Lower is better. `mixed` and `bahasa` models tested on different test set."
187+
"Lower is better. Mixed models tested on different dataset."
188188
]
189189
},
190190
{

example/stt-transducer-model/load-stt-transducer-model.ipynb

Lines changed: 3 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
},
3939
{
4040
"cell_type": "code",
41-
"execution_count": 1,
41+
"execution_count": null,
4242
"metadata": {},
4343
"outputs": [],
4444
"source": [
@@ -56,115 +56,11 @@
5656
},
5757
{
5858
"cell_type": "code",
59-
"execution_count": 2,
59+
"execution_count": null,
6060
"metadata": {
6161
"scrolled": false
6262
},
63-
"outputs": [
64-
{
65-
"data": {
66-
"text/html": [
67-
"<div>\n",
68-
"<style scoped>\n",
69-
" .dataframe tbody tr th:only-of-type {\n",
70-
" vertical-align: middle;\n",
71-
" }\n",
72-
"\n",
73-
" .dataframe tbody tr th {\n",
74-
" vertical-align: top;\n",
75-
" }\n",
76-
"\n",
77-
" .dataframe thead th {\n",
78-
" text-align: right;\n",
79-
" }\n",
80-
"</style>\n",
81-
"<table border=\"1\" class=\"dataframe\">\n",
82-
" <thead>\n",
83-
" <tr style=\"text-align: right;\">\n",
84-
" <th></th>\n",
85-
" <th>Size (MB)</th>\n",
86-
" <th>Quantized Size (MB)</th>\n",
87-
" <th>WER</th>\n",
88-
" <th>CER</th>\n",
89-
" <th>Language</th>\n",
90-
" </tr>\n",
91-
" </thead>\n",
92-
" <tbody>\n",
93-
" <tr>\n",
94-
" <th>small-conformer</th>\n",
95-
" <td>49.2</td>\n",
96-
" <td>18.1</td>\n",
97-
" <td>0.20599</td>\n",
98-
" <td>0.08933</td>\n",
99-
" <td>[malay]</td>\n",
100-
" </tr>\n",
101-
" <tr>\n",
102-
" <th>conformer</th>\n",
103-
" <td>125</td>\n",
104-
" <td>37.1</td>\n",
105-
" <td>0.16547</td>\n",
106-
" <td>0.0641</td>\n",
107-
" <td>[malay]</td>\n",
108-
" </tr>\n",
109-
" <tr>\n",
110-
" <th>large-conformer</th>\n",
111-
" <td>404</td>\n",
112-
" <td>107</td>\n",
113-
" <td>0.15986</td>\n",
114-
" <td>0.05937</td>\n",
115-
" <td>[malay]</td>\n",
116-
" </tr>\n",
117-
" <tr>\n",
118-
" <th>alconformer</th>\n",
119-
" <td>38.1</td>\n",
120-
" <td>15.1</td>\n",
121-
" <td>0.20703</td>\n",
122-
" <td>0.08533</td>\n",
123-
" <td>[malay]</td>\n",
124-
" </tr>\n",
125-
" <tr>\n",
126-
" <th>conformer-mixed</th>\n",
127-
" <td>125</td>\n",
128-
" <td>37.1</td>\n",
129-
" <td>0.35191</td>\n",
130-
" <td>0.23667</td>\n",
131-
" <td>[malay, singlish]</td>\n",
132-
" </tr>\n",
133-
" <tr>\n",
134-
" <th>large-conformer-mixed</th>\n",
135-
" <td>404</td>\n",
136-
" <td>107</td>\n",
137-
" <td>0.3359</td>\n",
138-
" <td>0.1989</td>\n",
139-
" <td>[malay, singlish]</td>\n",
140-
" </tr>\n",
141-
" </tbody>\n",
142-
"</table>\n",
143-
"</div>"
144-
],
145-
"text/plain": [
146-
" Size (MB) Quantized Size (MB) WER CER \\\n",
147-
"small-conformer 49.2 18.1 0.20599 0.08933 \n",
148-
"conformer 125 37.1 0.16547 0.0641 \n",
149-
"large-conformer 404 107 0.15986 0.05937 \n",
150-
"alconformer 38.1 15.1 0.20703 0.08533 \n",
151-
"conformer-mixed 125 37.1 0.35191 0.23667 \n",
152-
"large-conformer-mixed 404 107 0.3359 0.1989 \n",
153-
"\n",
154-
" Language \n",
155-
"small-conformer [malay] \n",
156-
"conformer [malay] \n",
157-
"large-conformer [malay] \n",
158-
"alconformer [malay] \n",
159-
"conformer-mixed [malay, singlish] \n",
160-
"large-conformer-mixed [malay, singlish] "
161-
]
162-
},
163-
"execution_count": 2,
164-
"metadata": {},
165-
"output_type": "execute_result"
166-
}
167-
],
63+
"outputs": [],
16864
"source": [
16965
"malaya_speech.stt.available_transducer()"
17066
]

pretrained-model/stt/conformer/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,4 +66,8 @@ Tensorboard, https://tensorboard.dev/experiment/1qBD7FGyS32Q8uQvhA1NnA/
6666

6767
12. Conformer, last update 12th June 2021, [output-base-conformer-v4.tar.gz](https://f000.backblazeb2.com/file/malaya-speech-model/pretrained/output-base-conformer-v4.tar.gz)
6868

69-
13. Large Conformer, last update 12th June 2021, [output-large-conformer-v4.tar.gz](https://f000.backblazeb2.com/file/malaya-speech-model/pretrained/output-large-conformer-v4.tar.gz)
69+
13. Large Conformer, last update 12th June 2021, [output-large-conformer-v4.tar.gz](https://f000.backblazeb2.com/file/malaya-speech-model/pretrained/output-large-conformer-v4.tar.gz)
70+
71+
14. Conformer Mixed, last update 29th June 2021, [output-base-mixed-conformer-v2.tar.gz](https://f000.backblazeb2.com/file/malaya-speech-model/pretrained/output-base-mixed-conformer-v2.tar.gz)
72+
73+
15. Large Conformer Mixed, last update 29th June 2021, [output-large-mixed-conformer-v2.tar.gz](https://f000.backblazeb2.com/file/malaya-speech-model/pretrained/output-large-conformer-mixed.tar.gz)

0 commit comments

Comments
 (0)