Skip to content

Speech samples #1036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 21, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Updated branch to new changes in library. For speech v1p1beta1
  • Loading branch information
nnegrey committed Feb 21, 2018
commit 44786c4ec489c1e4e5a17b58933e0004f4f96a5a
44 changes: 27 additions & 17 deletions speech/cloud-client/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Getting Started with Google Cloud Speech API and the Google Cloud Client libraries

<a href="https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/java-docs-samples&page=editor&open_in_editor=speech/cloud-client/README.md">
<img alt="Open in Cloud Shell" src ="http://gstatic.com/cloudssh/images/open-btn.png"></a>

[Google Cloud Speech API][speech] enables easy integration of Google speech
recognition technologies into developer applications.

Expand All @@ -9,76 +12,83 @@ using the [Google Cloud Client Library for Java][google-cloud-java].
[speech]: https://cloud.google.com/speech/docs/
[google-cloud-java]: https://github.com/GoogleCloudPlatform/google-cloud-java

## Quickstart
## Setup

Install [Maven](http://maven.apache.org/).

Build your project with:

mvn clean compile assembly:single

### Transcribe a local audio file (using the quickstart sample)
```
mvn clean compile assembly:single
```

java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.QuickstartSample
## Quickstart
Transcribe a local audio file
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.QuickstartSample
```

### Transcribe a local audio file (using the recognize sample)
## Transcribe a audio file
Transcribe a local audio file
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize syncrecognize ./resources/audio.raw
```

### Asynchronously transcribe a local audio file (using the recognize sample)
Asynchronously transcribe a local audio file
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize asyncrecognize ./resources/audio.raw
```

### Transcribe a remote audio file (using the recognize sample)
Transcribe a remote audio file
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize syncrecognize gs://cloud-samples-tests/speech/brooklyn.flac
```

### Asynchronously transcribe a remote audio file (using the recognize sample)
Asynchronously transcribe a remote audio file
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize asyncrecognize gs://cloud-samples-tests/speech/vr.flac
```

### Synchronously transcribe an audio file and print word offsets
## Transcribe a audio file and print word offsets
Synchronously transcribe an audio file and print word offsets
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize wordoffsets ./resources/audio.raw
```

### Asynchronously transcribe a remote audio file and print word offsets
Asynchronously transcribe a remote audio file and print word offsets
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize wordoffsets gs://cloud-samples-tests/speech/vr.flac
```

### Synchronously transcribe and punctuate a remote audio file
## Transcribe and punctuate a audio file
Synchronously transcribe and punctuate a remote audio file
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize punctuation ./resources/audio.raw
```

### Asynchronously transcribe and punctuate an audio file hosted on GCS
Asynchronously transcribe and punctuate an audio file hosted on GCS
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize punctuation gs://cloud-samples-tests/speech/brooklyn.flac
```

### Synchronously transcribe a video file
## Transcribe a video file
Synchronously transcribe a video file
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize video ./resources/Google_Gnome.wav
```

### Asynchronously transcribe a video file hosted on GCS
Asynchronously transcribe a video file hosted on GCS
```
java -cp target/speech-google-cloud-samples-1.0.0-jar-with-dependencies.jar \
com.example.speech.Recognize video gs://cloud-samples-tests/speech/Google_Gnome.wav
```

5 changes: 2 additions & 3 deletions speech/cloud-client/pom.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
Copyright 2017, Google Inc.
Copyright 2018, Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -34,12 +34,11 @@
</properties>

<dependencies>
<!-- TODO replace with release version -->
<!-- [START dependencies] -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-speech</artifactId>
<version>0.22.1-alpha-SNAPSHOT</version>
<version>0.35.0-alpha</version>
</dependency>
<!-- [END dependencies] -->

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright 2017, Google Inc.
Copyright 2018, Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -18,7 +18,7 @@

// [START speech_quickstart]
// Imports the Google Cloud client library
import com.google.cloud.speech.v1_1beta1.SpeechClient;
import com.google.cloud.speech.v1p1beta1.SpeechClient;
import com.google.cloud.speech.v1p1beta1.RecognitionAudio;
import com.google.cloud.speech.v1p1beta1.RecognitionConfig;
import com.google.cloud.speech.v1p1beta1.RecognitionConfig.AudioEncoding;
Expand Down
119 changes: 13 additions & 106 deletions speech/cloud-client/src/main/java/com/example/speech/Recognize.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright 2017, Google Inc.
Copyright 2018, Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand All @@ -16,17 +16,15 @@

package com.example.speech;

import com.google.api.gax.longrunning.OperationFuture;
import com.google.api.gax.rpc.ApiStreamObserver;
import com.google.api.gax.rpc.BidiStreamingCallable;
import com.google.api.gax.rpc.OperationFuture;
import com.google.cloud.speech.v1_1beta1.SpeechClient;
import com.google.cloud.speech.v1p1beta1.SpeechClient;
import com.google.cloud.speech.v1p1beta1.LongRunningRecognizeMetadata;
import com.google.cloud.speech.v1p1beta1.LongRunningRecognizeResponse;
import com.google.cloud.speech.v1p1beta1.RecognitionAudio;
import com.google.cloud.speech.v1p1beta1.RecognitionConfig;
import com.google.cloud.speech.v1p1beta1.RecognitionConfig.AudioEncoding;
import com.google.cloud.speech.v1p1beta1.RecognitionMetadata;
import com.google.cloud.speech.v1p1beta1.RecognitionMetadata.OriginalMediaType;
import com.google.cloud.speech.v1p1beta1.RecognizeResponse;
import com.google.cloud.speech.v1p1beta1.SpeechRecognitionAlternative;
import com.google.cloud.speech.v1p1beta1.SpeechRecognitionResult;
Expand All @@ -36,7 +34,6 @@
import com.google.cloud.speech.v1p1beta1.StreamingRecognizeResponse;
import com.google.cloud.speech.v1p1beta1.WordInfo;
import com.google.common.util.concurrent.SettableFuture;
import com.google.longrunning.Operation;
import com.google.protobuf.ByteString;

import java.io.IOException;
Expand Down Expand Up @@ -86,12 +83,6 @@ public static void main(String... args) throws Exception {
}
} else if (command.equals("streamrecognize")) {
streamingRecognizeFile(path);
} else if (command.equals("punctuation")) {
if (path.startsWith("gs://")) {
transcribeGcsWithAutomaticPunctuation(path);
} else {
transcribeFileWithAutomaticPunctuation(path);
}
} else if (command.equals("video")) {
if (path.startsWith("gs://")) {
transcribeGcsVideoFile(path);
Expand Down Expand Up @@ -235,7 +226,7 @@ public static void asyncRecognizeFile(String fileName) throws Exception {
.build();

// Use non-blocking call for getting file transcription
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata, Operation> response =
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
speech.longRunningRecognizeAsync(config, audio);

while (!response.isDone()) {
Expand Down Expand Up @@ -276,8 +267,7 @@ public static void asyncRecognizeWords(String gcsUri) throws Exception {
.build();

// Use non-blocking call for getting file transcription
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata,
Operation> response =
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
speech.longRunningRecognizeAsync(config, audio);
while (!response.isDone()) {
System.out.println("Waiting for response...");
Expand Down Expand Up @@ -324,7 +314,7 @@ public static void asyncRecognizeGcs(String gcsUri) throws Exception {
.build();

// Use non-blocking call for getting file transcription
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata, Operation> response =
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
speech.longRunningRecognizeAsync(config, audio);
while (!response.isDone()) {
System.out.println("Waiting for response...");
Expand Down Expand Up @@ -427,80 +417,7 @@ public SettableFuture<List<T>> future() {
speech.close();
}


/**
* Performs transcription with automatic punctuation on raw PCM audio data.
*
* @param fileName the path to a PCM audio file to transcribe.
*/
public static void transcribeFileWithAutomaticPunctuation(String fileName) throws Exception {
Path path = Paths.get(fileName);
byte[] content = Files.readAllBytes(path);

// [START transcribe_file_with_automatic_punctuation]
try (SpeechClient speechClient = SpeechClient.create()) {
// Configure request with local raw PCM audio
RecognitionConfig recConfig = RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setSampleRateHertz(16000)
.setEnableAutomaticPunctuation(true)
.build();

RecognitionAudio recognitionAudio = RecognitionAudio.newBuilder()
.setContent(ByteString.copyFrom(content))
.build();

RecognizeResponse recognizeResponse = speechClient.recognize(recConfig, recognitionAudio);
// Just print the first result here.
SpeechRecognitionResult result = recognizeResponse.getResultsList().get(0);
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcript : %s\n", alternative.getTranscript());
}
// [END transcribe_file_with_automatic_punctuation]
}

/**
* Performs transcription on remote FLAC file and prints the transcription.
*
* @param gcsUri the path to the remote FLAC audio file to transcribe.
*/
public static void transcribeGcsWithAutomaticPunctuation(String gcsUri) throws Exception {
// [START transcribe_gcs_with_automatic_punctuation]
try (SpeechClient speechClient = SpeechClient.create()) {

RecognitionConfig config = RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.FLAC)
.setLanguageCode("en-US")
.setSampleRateHertz(16000)
.setEnableAutomaticPunctuation(true)
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder()
.setUri(gcsUri)
.build();

// Use non-blocking call for getting file transcription
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata, Operation> response =
speechClient.longRunningRecognizeAsync(config, audio);

while (!response.isDone()) {
System.out.println("Waiting for response...");
Thread.sleep(10000);
}

// Just print the first result here.
SpeechRecognitionResult result = response.get().getResultsList().get(0);

// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcript : %s\n", alternative.getTranscript());
}
// [START transcribe_gcs_with_automatic_punctuation]
}

// [START speech_transcribe_model_selection]
/**
* Performs transcription of the given audio file synchronously with
* video as the original media type.
Expand All @@ -510,21 +427,15 @@ public static void transcribeVideoFile(String fileName) throws Exception {
Path path = Paths.get(fileName);
byte[] content = Files.readAllBytes(path);

// [START transcribe_video_file]
try (SpeechClient speech = SpeechClient.create()) {

RecognitionMetadata recognitionMetadata = RecognitionMetadata.newBuilder()
.setOriginalMediaType(OriginalMediaType.VIDEO)
.build();

// Configure request with video media type
RecognitionConfig recConfig = RecognitionConfig.newBuilder()
// encoding may either be omitted or must match the value in the file header
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
// sample rate hertz may be either be omitted or must match the value in the file header
.setSampleRateHertz(16000)
.setMetadata(recognitionMetadata)
.setModel("video")
.build();

RecognitionAudio recognitionAudio = RecognitionAudio.newBuilder()
Expand All @@ -540,38 +451,34 @@ public static void transcribeVideoFile(String fileName) throws Exception {
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcript : %s\n", alternative.getTranscript());
}
// [END transcribe_video_file]
// [END speech_transcribe_model_selection]
}

// [START speech_transcribe_model_selection_gcs]
/**
* Performs transcription on remote video file and prints the transcription.
*
* @param gcsUri the path to the remote video file to transcribe.
*/
public static void transcribeGcsVideoFile(String gcsUri) throws Exception {
// [START transcribe_video_gcs]
try (SpeechClient speech = SpeechClient.create()) {

RecognitionMetadata recognitionMetadata = RecognitionMetadata.newBuilder()
.setOriginalMediaType(OriginalMediaType.VIDEO)
.build();

// Configure request with video media type
RecognitionConfig config = RecognitionConfig.newBuilder()
// encoding may either be omitted or must match the value in the file header
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
// sample rate hertz may be either be omitted or must match the value in the file header
.setSampleRateHertz(16000)
.setMetadata(recognitionMetadata)
.setModel("video")
.build();

RecognitionAudio audio = RecognitionAudio.newBuilder()
.setUri(gcsUri)
.build();

// Use non-blocking call for getting file transcription
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata, Operation> response =
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
speech.longRunningRecognizeAsync(config, audio);

while (!response.isDone()) {
Expand All @@ -588,6 +495,6 @@ public static void transcribeGcsVideoFile(String gcsUri) throws Exception {
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcript : %s\n", alternative.getTranscript());
}
// [START transcribe_video_gcs]
// [END speech_transcribe_model_selection_gcs]
}
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright 2017, Google, Inc.
Copyright 2018, Google, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Loading