TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment 😄
- What's New?
- Table of Contents
- 😋 Supported Models
- Installation
- Training & Testing Tutorial
- Features Extraction
- Augmentations
- TFLite Convertion
- Pretrained Models
- Corpus Sources
- How to contribute
- References & Credits
- Contact
- Transducer Models (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer)
- CTCModel (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper)
- Conformer Transducer (Reference: https://arxiv.org/abs/2005.08100) See examples/conformer
- ContextNet (Reference: http://arxiv.org/abs/2005.03191) See examples/contextnet
- RNN Transducer (Reference: https://arxiv.org/abs/1811.06621) See examples/rnn_transducer
- Deep Speech 2 (Reference: https://arxiv.org/abs/1512.02595) See examples/deepspeech2
- Jasper (Reference: https://arxiv.org/abs/1904.03288) See examples/jasper
For training and testing, you should use git clone
for installing necessary packages from other authors (ctc_decoders
, rnnt_loss
, etc.)
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"
For anaconda3:
conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow
conda activate tfasr
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"
# Tensorflow 2.x (with 2.x >= 2.3)
pip3 install "TensorFlowASR[tf2.x]" # or pip3 install "TensorFlowASR[tf2.x-gpu]"
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install -e ".[dev]"
pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" or ".[tf2.x-apple]" for apple m1 machine
Due to tensorflow-text is not built for Apple Sillicon, we need to install it with the prebuilt wheel file from sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon
Do this after installing TensorFlowASR with tensorflow above
TF_VERSION="$(python3 -c 'import tensorflow; print(tensorflow.__version__)')" && \
TF_VERSION_MAJOR="$(echo $TF_VERSION | cut -d'.' -f1,2)" && \
URL="https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon" && \
pip3 install "${URL}/releases/download/v${TF_VERSION_MAJOR}/tensorflow_text-${TF_VERSION_MAJOR}.0-cp310-cp310-macosx_11_0_arm64.whl"
docker-compose up -d
- For training, please read tutorial_training
- For testing, please read tutorial_testing
FYI: Keras builtin training uses infinite dataset, which avoids the potential last partial batch.
See examples for some predefined ASR models and results
See augmentations
After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.
- Install
tf-nightly
usingpip install tf-nightly
- Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
- Load
TFSpeechFeaturizer
andTextFeaturizer
to model using functionadd_featurizers
- Convert model's function to tflite as follows:
func = model.make_tflite_function(**options) # options are the arguments of the function
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
- Save the converted tflite model as follows:
if not os.path.exists(os.path.dirname(tflite_path)):
os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
tflite_out.write(tflite_model)
- Then the
.tflite
model is ready to be deployed
Go to drive
Name | Source | Hours |
---|---|---|
LibriSpeech | LibriSpeech | 970h |
Common Voice | https://commonvoice.mozilla.org | 1932h |
Name | Source | Hours |
---|---|---|
Vivos | https://ailab.hcmus.edu.vn/vivos | 15h |
InfoRe Technology 1 | InfoRe1 (passwd: BroughtToYouByInfoRe) | 25h |
InfoRe Technology 2 (used in VLSP2019) | InfoRe2 (passwd: BroughtToYouByInfoRe) | 415h |
- Fork the project
- Install for development
- Create a branch
- Make a pull request to this repo
- NVIDIA OpenSeq2Seq Toolkit
- https://github.com/noahchalifour/warp-transducer
- Sequence Transduction with Recurrent Neural Network
- End-to-End Speech Processing Toolkit in PyTorch
- https://github.com/iankur/ContextNet
Huy Le Nguyen
Email: [email protected]