Skip to content

sala2000/leopard

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Leopard

Made in Vancouver, Canada by Picovoice

Leopard is an on-device Speech-to-Text engine.

Leopard is:

  • offline: runs locally, without an Internet connection.
  • highly accurate [1].
  • compact and computationally-efficient [1].
  • cross-platform. Linux (x86_64), Mac (x86_64), Windows (x86_64), web browsers, Android, iOS, Raspberry Pi, and BeagleBone are supported. Linux (x86_64) is available for personal and non-commercial use, free of charge. Other platforms are only available under a commercial license.
  • customizable. Allows adding new words and adapting to different contexts (Available only under a commercial license).

Table of Contents

License

This repository is provided for personal & non-commercial use only. Refer to LICENSE for details. If you wish to use Leopard in a commercial product, contact Picovoice.

Use Cases

Leopard is intended to be used for open-domain transcription applications. It is an offline transcription engine (i.e. file-based processing).

  • If real-time feedback (incremental transcription results) is required, see Cheetah.
  • If you need to understand naturally-spoken (complex) commands within a specific domain, see Rhino.
  • If you need to recognize a small set of fixed voice commands or activate a device using voice, see Porcupine.

Structure of Repository

Leopard is shipped as a dynamic library. The binary files for supported platforms are located under lib, and header files are at include. Bindings are available at binding to facilitate usage from higher-level languages/platforms. Demo applications are at demo. Finally, resources is a placeholder for data used by various applications within the repository.

Picovoice Console and License File

In order to run, Leopard requires a valid license file ('.lic' extension). To obtain a time-limited evaluation license file, visit Picovoice Console. To obtain a commercial license, contact Picovoice.

Running Demo Applications

Python Demo Application

The demo transcribes a set of audio files provided as command line arguments. The demo has been tested using Python 3.6 on a machine running Ubuntu 18.04 (x86_64). Note that the audio files need to be single-channel, 16KHz, and 16-bit linearly-encoded. For more information about audio requirements, refer to pv_leopard.h. The following transcribes the WAV file located in the resource directory:

python demo/python/leopard_demo.py --audio_paths resources/audio_samples/test.wav --license_path ${PATH_TO_YOUR_LEOPARD_LICENSE_FILE}

In order to transcribe multiple files provide their paths:

python demo/python/leopard_demo.py --audio_paths ${PATH_TO_AUDIO_FILE_1} ${PATH_TO_AUDIO_FILE_2} ${PATH_TO_AUDIO_FILE_3} --license_path ${PATH_TO_YOUR_LEOPARD_LICENSE_FILE}

C Demo Application

This demo application accepts a list of WAV files as input and returns their transcripts. Note that the demo expects the audio files to be WAV, 16KHz, and 16-bit linearly-encoded. It does not perform any verification to assure the compatibility or correctness of the input audio files. Set the current working directory to the root of the repository. The demo can be built using gcc:

gcc -I include/ -O3 demo/c/leopard_demo.c -ldl -o leopard_demo

The usage can be attained with:

./leopard_demo

Then it can be used as follows:

./leopard_demo \
./lib/linux/x86_64/libpv_leopard.so \
./lib/common/acoustic_model.pv \
./lib/common/language_model.pv \
${PATH_TO_YOUR_LEOPARD_LICENSE_FILE} \
./resources/audio_samples/test.wav

Integration

Python

leopard.py provides a Python binding. Below is a quick demonstration of how to construct an instance:

library_path = ...  # the file is available under lib/linux/x86_64/libpv_leopard.so
acoustic_model_path = ...  # the file is available under lib/common/acoustic_model.pv
language_model_path = ...  # the file is available under lib/common/language_model.pv
license_path = ...  # The .lic file is available from Picovoice Console (https://picovoice.ai/console/)

handle = Leopard(library_path, acoustic_model_path, language_model_path, license_path)

When initialized, valid sample rate can be retrieved using handle.sample_rate. Additionally, Leopard accepts single-channel 16-bit linearly-encoded audio.

audio = ... # audio data to be transcribed as a NumPy array

transcript = handle.process(audio)

When finished, release the acquired resources:

handle.delete()

C

Leopard is implemented in ANSI C and therefore can be directly linked to C applications. pv_leopard.h header file contains relevant information. An instance of Leopard object can be constructed as follows.

const char *acoustic_model_path = ... // the file is available under lib/common/acoustic_model.pv
const char *language_model_path = ... // the file is available under lib/common/language_model.pv
const char *license_path = ... // The .lic file is available from Picovoice Console (https://picovoice.ai/console/)

pv_leopard_t *handle;
const pv_status_t status = pv_leopard_init(acoustic_model_path, language_model_path, license_path, &handle);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Now the handle can be used to process audio. Leopard accepts single-channel 16-bit linearly-encoded audio. The sample rate can be retrieved using pv_sample_rate().

const int16_t *pcm = ... // audio data to be transcribed
const int32_t num_samples = ... // number of samples to process

char *transcript;
const pv_status_t status = pv_leopard_process(handle, pcm, num_samples, &transcript);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

// the caller is required to free the transcription buffer when done.
free(transcript);

Finally, when done be sure to release resources acquired.

pv_leopard_delete(handle);

Releases

V1.0.0 — January 14th, 2020

  • Initial release.

About

On-device speech-to-text engine powered by deep learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 62.0%
  • C 38.0%