Skip to content

Commit fe29631

Browse files
authored
Merge branch 'master' into master
2 parents 17517cb + cc943c3 commit fe29631

File tree

88 files changed

+1884
-55
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+1884
-55
lines changed

.install

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
virtualenv -p python3 ../tmp/venv
44
source ../tmp/venv/bin/activate
55
pip install -r <(grep -v tensorflow requirements.txt)
6-
pip install tensorflow-gpu==1.12.0rc2
6+
pip install tensorflow-gpu==1.12.0
77

88
python3 util/taskcluster.py --arch gpu --target ../tmp/native_client
99

DeepSpeech.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -890,6 +890,7 @@ def main(_):
890890
if len(FLAGS.worker_hosts) == 0:
891891
# Only one local task: this process (default case - no cluster)
892892
with tf.Graph().as_default():
893+
tf.set_random_seed(FLAGS.random_seed)
893894
train()
894895
# Now do a final test epoch
895896
if FLAGS.test:

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ RUN cp /tensorflow/bazel-bin/native_client/generate_trie /DeepSpeech/native_clie
186186

187187
# Install TensorFlow
188188
WORKDIR /DeepSpeech/
189-
RUN pip install tensorflow-gpu==1.12.0rc2
189+
RUN pip install tensorflow-gpu==1.12.0
190190

191191

192192
# Make DeepSpeech and install Python bindings

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ If you have a capable (Nvidia, at least 8GB of VRAM) GPU, it is highly recommend
227227

228228
```bash
229229
pip3 uninstall tensorflow
230-
pip3 install 'tensorflow-gpu==1.12.0rc2'
230+
pip3 install 'tensorflow-gpu==1.12.0'
231231
```
232232

233233
### Common Voice training data
@@ -284,7 +284,7 @@ If you are brave enough, you can also include the `other` dataset, which contain
284284
The central (Python) script is `DeepSpeech.py` in the project's root directory. For its list of command line options, you can call:
285285

286286
```bash
287-
./DeepSpeech.py --help
287+
./DeepSpeech.py --helpfull
288288
```
289289

290290
To get the output of this in a slightly better-formatted way, you can also look up the option definitions top of `DeepSpeech.py`.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.4.0-alpha.0
1+
0.4.0-alpha.2
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# FFmpeg VAD Streaming
2+
3+
Streaming inference from arbitrary source (FFmpeg input) to DeepSpeech, using VAD (voice activity detection). A fairly simple example demonstrating the DeepSpeech streaming API in Node.js.
4+
5+
This example was successfully tested with a mobile phone streaming a live feed to a RTMP server (nginx-rtmp), which then could be used by this script for near real time speech recognition.
6+
7+
## Installation
8+
9+
```bash
10+
npm install
11+
```
12+
13+
Moreover FFmpeg must be installed:
14+
15+
```bash
16+
sudo apt-get install ffmpeg
17+
```
18+
19+
## Usage
20+
21+
Here is an example for a local audio file:
22+
```bash
23+
node ./index.js --audio <AUDIO_FILE> --model $HOME/models/output_graph.pbmm --alphabet $HOME/models/alphabet.txt
24+
```
25+
26+
Here is an example for a remote RTMP-Stream:
27+
```bash
28+
node ./index.js --audio rtmp://<IP>:1935/live/teststream --model $HOME/models/output_graph.pbmm --alphabet $HOME/models/alphabet.txt
29+
```
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
#!/usr/bin/env node
2+
3+
const VAD = require("node-vad");
4+
const Ds = require('deepspeech');
5+
const argparse = require('argparse');
6+
const util = require('util');
7+
8+
// These constants control the beam search decoder
9+
10+
// Beam width used in the CTC decoder when building candidate transcriptions
11+
const BEAM_WIDTH = 1024;
12+
13+
// The alpha hyperparameter of the CTC decoder. Language Model weight
14+
const LM_WEIGHT = 1.50;
15+
16+
// Valid word insertion weight. This is used to lessen the word insertion penalty
17+
// when the inserted word is part of the vocabulary
18+
const VALID_WORD_COUNT_WEIGHT = 2.25;
19+
20+
// These constants are tied to the shape of the graph used (changing them changes
21+
// the geometry of the first layer), so make sure you use the same constants that
22+
// were used during training
23+
24+
// Number of MFCC features to use
25+
const N_FEATURES = 26;
26+
27+
// Size of the context window used for producing timesteps in the input vector
28+
const N_CONTEXT = 9;
29+
30+
let VersionAction = function VersionAction(options) {
31+
options = options || {};
32+
options.nargs = 0;
33+
argparse.Action.call(this, options);
34+
};
35+
36+
util.inherits(VersionAction, argparse.Action);
37+
38+
VersionAction.prototype.call = function(parser) {
39+
Ds.printVersions();
40+
process.exit(0);
41+
};
42+
43+
let parser = new argparse.ArgumentParser({addHelp: true, description: 'Running DeepSpeech inference.'});
44+
parser.addArgument(['--model'], {required: true, help: 'Path to the model (protocol buffer binary file)'});
45+
parser.addArgument(['--alphabet'], {required: true, help: 'Path to the configuration file specifying the alphabet used by the network'});
46+
parser.addArgument(['--lm'], {help: 'Path to the language model binary file', nargs: '?'});
47+
parser.addArgument(['--trie'], {help: 'Path to the language model trie file created with native_client/generate_trie', nargs: '?'});
48+
parser.addArgument(['--audio'], {required: true, help: 'Path to the audio file to run (WAV format)'});
49+
parser.addArgument(['--version'], {action: VersionAction, help: 'Print version and exits'});
50+
let args = parser.parseArgs();
51+
52+
function totalTime(hrtimeValue) {
53+
return (hrtimeValue[0] + hrtimeValue[1] / 1000000000).toPrecision(4);
54+
}
55+
56+
console.error('Loading model from file %s', args['model']);
57+
const model_load_start = process.hrtime();
58+
let model = new Ds.Model(args['model'], N_FEATURES, N_CONTEXT, args['alphabet'], BEAM_WIDTH);
59+
const model_load_end = process.hrtime(model_load_start);
60+
console.error('Loaded model in %ds.', totalTime(model_load_end));
61+
62+
if (args['lm'] && args['trie']) {
63+
console.error('Loading language model from files %s %s', args['lm'], args['trie']);
64+
const lm_load_start = process.hrtime();
65+
model.enableDecoderWithLM(args['alphabet'], args['lm'], args['trie'],
66+
LM_WEIGHT, VALID_WORD_COUNT_WEIGHT);
67+
const lm_load_end = process.hrtime(lm_load_start);
68+
console.error('Loaded language model in %ds.', totalTime(lm_load_end));
69+
}
70+
71+
const vad = new VAD(VAD.Mode.NORMAL);
72+
const voice = {START: true, STOP: false};
73+
let sctx = model.setupStream(150, 16000);
74+
let state = voice.STOP;
75+
76+
function finishStream() {
77+
const model_load_start = process.hrtime();
78+
console.error('Running inference.');
79+
console.log('Transcription: ', model.finishStream(sctx));
80+
const model_load_end = process.hrtime(model_load_start);
81+
console.error('Inference took %ds.', totalTime(model_load_end));
82+
}
83+
84+
let ffmpeg = require('child_process').spawn('ffmpeg', [
85+
'-hide_banner',
86+
'-nostats',
87+
'-loglevel', 'fatal',
88+
'-i', args['audio'],
89+
'-af', 'highpass=f=200,lowpass=f=3000',
90+
'-vn',
91+
'-acodec', 'pcm_s16le',
92+
'-ac', 1,
93+
'-ar', 16000,
94+
'-f', 's16le',
95+
'pipe:'
96+
]);
97+
98+
ffmpeg.stdout.on('data', chunk => {
99+
vad.processAudio(chunk, 16000).then(res => {
100+
switch (res) {
101+
case VAD.Event.SILENCE:
102+
if (state === voice.START) {
103+
state = voice.STOP;
104+
finishStream();
105+
sctx = model.setupStream(150,16000);
106+
}
107+
break;
108+
case VAD.Event.VOICE:
109+
state = voice.START;
110+
model.feedAudioContent(sctx, chunk.slice(0, chunk.length / 2));
111+
break;
112+
}
113+
});
114+
});
115+
116+
ffmpeg.stdout.on('close', code => {
117+
finishStream();
118+
});
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"name": "ffmpeg-vad-streaming",
3+
"version": "1.0.0",
4+
"description": "Streaming inference from arbitrary source with VAD and FFmpeg",
5+
"main": "index.js",
6+
"scripts": {
7+
"start": "node ./index.js"
8+
},
9+
"dependencies": {
10+
"argparse": "^1.0.10",
11+
"deepspeech": "^0.3.0",
12+
"node-vad": "^1.1.1",
13+
"util": "^0.11.1"
14+
},
15+
"license" : "MIT"
16+
}

examples/mic_vad_streaming/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,12 @@ Uses portaudio for microphone access, so on Linux, you may need to install its h
1414
sudo apt install portaudio19-dev
1515
```
1616

17+
Installation on MacOS may fail due to portaudio, use brew to install it:
18+
19+
```bash
20+
brew install portaudio
21+
```
22+
1723
## Usage
1824

1925
```

native_client/Android.mk

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
LOCAL_PATH := $(call my-dir)
2+
3+
include $(CLEAR_VARS)
4+
LOCAL_MODULE := deepspeech-prebuilt
5+
LOCAL_SRC_FILES := $(TFDIR)/bazel-bin/native_client/libdeepspeech.so
6+
include $(PREBUILT_SHARED_LIBRARY)
7+
8+
include $(CLEAR_VARS)
9+
LOCAL_CPP_EXTENSION := .cc .cxx .cpp
10+
LOCAL_MODULE := deepspeech
11+
LOCAL_SRC_FILES := client.cc
12+
LOCAL_SHARED_LIBRARIES := deepspeech-prebuilt
13+
LOCAL_LDFLAGS := -Wl,--no-as-needed
14+
include $(BUILD_EXECUTABLE)

0 commit comments

Comments
 (0)