Skip to content

Uses MPS (Mac acceleration) by default when available #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dwarkeshsp
Copy link

Currently, Whisper defaults to using the CPU on MacOS devices despite the fact that PyTorch has introduced Metal Performance Shaders framework for Apple devices in the nightly release (more info).

With my changes to init.py, torch checks in MPS is available if torch.device has not been specified. If it is, and CUDA is not available, then Whisper defaults to MPS.

This way, Mac users can experience speedups from their GPU by default.

@usergit
Copy link

usergit commented Oct 21, 2022

@dwarkeshsp have you measured any speedups compared to using the CPU?

@Michcioperz
Copy link

Doesn't this also require switching FP16 off?

@DiegoGiovany
Copy link

DiegoGiovany commented Nov 9, 2022

I'm getting this error when try to use MPS

/Users/diego/.pyenv/versions/3.10.6/lib/python3.10/site-packages/whisper-1.0-py3.10.egg/whisper/decoding.py:629: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/diego/Projects/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
audio_features = audio_features.repeat_interleave(self.n_group, dim=0)
/AppleInternal/Library/BuildRoots/2d9b4df9-4b93-11ed-b0fc-2e32217d8374/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:794: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 23200 bytes
'
Abort trap: 6
/Users/diego/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

any clues?

@glangford
Copy link

@DiegoGiovany Not an expert on this but It looks like PyTorch itself is missing some operators for MPS. See for example
pytorch/pytorch#77764 (comment)
(which refers to repeat_interleave)

and
pytorch/pytorch#87219

@gltanaka
Copy link

gltanaka commented Nov 17, 2022

Thanks for your work. I just tried this. Unfortunately, it didn't work for me on my m1 max with 32GB.
Here is what I did:
pip install git+https://github.com/openai/whisper.git@refs/pull/382/head

No errors on install and it works fine when run without mps: whisper audiofile_name --model medium

When I run: whisper audiofile_name --model medium --device mps

Here is the error I get:
Detecting language using up to the first 30 seconds. Use --language to specify the language
loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1024x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).

When I run: whisper audiofile_name --model medium --device mps --fp16 False

Here is the error I get:
Detecting language using up to the first 30 seconds. Use --language to specify the language
Detected language: English
/anaconda3/lib/python3.9/site-packages/whisper/decoding.py:633: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
audio_features = audio_features.repeat_interleave(self.n_group, dim=0)
/AppleInternal/Library/BuildRoots/f0468ab4-4115-11ed-8edc-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:794: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 1007280 bytes

Basically, same error as @DiegoGiovany.

Any ideas on how to fix?

@megeek
Copy link

megeek commented Nov 28, 2022

+1 for me! I'm actually using an Intel Mac with Radeon Pro 560X 4 GB...

@glangford
Copy link

Related
pytorch/pytorch#87351

@PhDLuffy
Copy link

PhDLuffy commented Dec 8, 2022

@dwarkeshsp

not work,with mbp2015 pytorch 1.3 stable,egpu RX580, MacOS 12.3.

changed the code as the same as yours.

changed to use --device mps but show error, maybe there is still somewhere to change or modify.

use --device cpu, it works.

with other pytorch-metal project, MPS works.

@changeling
Copy link

What's the status on this?

@jongwook
Copy link
Collaborator

I also see the same errors as others mentioned above, on an M1 Mac running arm64 Python.

@changeling
Copy link

changeling commented Jan 19, 2023

On an M1 16" MBP with 16GB running MacOS 13.0.1, I'm seeing the following with openai-whisper-20230117:

Using this command:
(venv) whisper_ai_playground % whisper './test_file.mp3' --model tiny.en --output_dir ./output --device mps

I'm encountering the following errors:

loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x384x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible

LLVM ERROR: Failed to infer result type(s).

zsh: abort whisper --model tiny.en --output_dir ./output --device mps

  warnings.warn('resource_tracker: There appear to be %d '```

@sachit-menon
Copy link

Is there any update on this, or did anyone figure out how to get it to work?

@renderpci
Copy link

renderpci commented Feb 5, 2023

Same problem with osx 13.2 in MacBook Pro M2 max:

loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort      whisper audio.wav --language en --model large
m2@Render ~ % /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@DontEatOreo
Copy link

I'm getting the same error as @renderpci using the M1 Base Model

loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x512x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1]    3746 abort      python3 test.py

test.py:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

@saurabhsharan
Copy link

saurabhsharan commented Feb 6, 2023

FWIW I switched to the C++ port https://github.com/ggerganov/whisper.cpp/ and got a ~15x speedup compared to CPU pytorch on my M1 Pro. (But note that it doesn't have all the features/flags from the official whisper repo.)

@renderpci
Copy link

renderpci commented Feb 6, 2023

FWIW I switched to the C++ port https://github.com/ggerganov/whisper.cpp/

For us whisper.cpp is not an option:

Should I use whisper.cpp in my project?

whisper.cpp is a hobby project. It does not strive to provide a production ready implementation. The main goals of the implementation is to be educational, minimalistic, portable, hackable and performant. There are no guarantees that the implementation is correct and bug-free and stuff can break at any point in the future. Support and updates will depend mostly on contributions, since with time I will move on and won't dedicate too much time on the project.

If you plan to use whisper.cpp in your own project, keep in mind the above.
My advice is to not put all your eggs into the whisper.cpp basket.

@devpacdd
Copy link

devpacdd commented Feb 7, 2023

The same error as @renderpci using the M2

whisper interview.mp4 --language en --model large --device mps

loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort      whisper interview.mp4 --language en --model large --device mps
pac@dd ~ % /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@DenisVieriu97
Copy link

DenisVieriu97 commented Feb 21, 2023

Hey @devpacdd - this should be fixed in latest pytorch nightly (pip3 install --pre --force-reinstall torch --index-url https://download.pytorch.org/whl/nightly/cpu). Let me know if you still see any issues. Thanks

@manuthebyte
Copy link

manuthebyte commented Feb 21, 2023

Still have the same error after updating

Edit: After adding --fp16 False to the command, I now get a new error, as well as the old one:

/opt/homebrew/lib/python3.10/site-packages/whisper/decoding.py:633: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  audio_features = audio_features.repeat_interleave(self.n_group, dim=0)
/AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:794: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 1007280 bytes
'
zsh: abort      whisper --model large --language de --task transcribe  --device mps --fp16
/opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@cameronbergh
Copy link

i was able to get it to kinda work: davabase/whisper_real_time#5 (comment)

@DenisVieriu97
Copy link

The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
audio_features = audio_features.repeat_interleave(self.n_group, dim=0)

@manuthebyte could you please make sure you are on a recent nightly? repeat_interleave should be natively supported. If you could try grabbing today's nightly and give a try that would be awesome! (You can get today's nightly with pip3 install --pre --force-reinstall torch==2.0.0.dev20230224 --index-url https://download.pytorch.org/whl/nightly/cpu)

@cameronbergh
Copy link

cameronbergh commented Feb 25, 2023

Wow!

when running:
Python3 transcribe_demo.py --model medium (from https://github.com/davabase/whisper_real_time)

with the following packages in my pipenv's requirements.txt

certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
Pillow==9.4.0
PyAudio==0.2.13
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch==2.0.0.dev20230224
torchaudio==0.13.1
torchvision==0.14.1
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.4.0
urllib3==1.26.14

it gets every word! while i was singing! in realtime, with maybe 50%~ gpu usage on the apple M2 Pro Max.

@andrewguy9
Copy link

Did some performance testing of MPS vs CPU on Apple M2 Pro.

I tested a 30 second clip for performance and accuracy on every version of the model and CPU vs MPS.

Here is the MPS Version:
m2pro_mps

details
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": [
      { "Model": "tiny.en", "Transcribe Time vs Audio Time": 0.214962963, "Perfect": false },
      { "Model": "tiny", "Transcribe Time vs Audio Time": 0.267037037, "Perfect": false },
      { "Model": "base.en", "Transcribe Time vs Audio Time": 0.2654814815, "Perfect": false },
      { "Model": "base", "Transcribe Time vs Audio Time": 0.3830740741, "Perfect": false },
      { "Model": "small.en", "Transcribe Time vs Audio Time": 0.6409259259, "Perfect": true },
      { "Model": "small", "Transcribe Time vs Audio Time": 0.7203333333, "Perfect": true },
      { "Model": "medium.en", "Transcribe Time vs Audio Time": 2.406666667, "Perfect": false },
      { "Model": "medium", "Transcribe Time vs Audio Time": 1.545925926, "Perfect": true },
      { "Model": "large", "Transcribe Time vs Audio Time": 3.214814815, "Perfect": true },
      { "Model": "large-v1", "Transcribe Time vs Audio Time": 3.090740741, "Perfect": true },
      { "Model": "large-v2", "Transcribe Time vs Audio Time": 3.181481481, "Perfect": true }
    ]
  },
  "mark": {
    "type": "bar",
    "color": {
      "condition": {
        "test": "datum.Perfect === true",
        "value": "green"
      },
      "value": "red"
    }
  },
  "encoding": {
    "x": {
      "field": "Model",
      "type": "nominal",
      "sort": null
    },
    "y": {
      "field": "Transcribe Time vs Audio Time",
      "type": "quantitative"
    }
  }
}

Here is the CPU Version:
m2pro_cpu

details
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": [
      { "Model": "tiny.en", "Transcribe Time vs Audio Time": 0.101037037, "Perfect": false },
      { "Model": "tiny", "Transcribe Time vs Audio Time": 0.1236296296, "Perfect": false },
      { "Model": "base.en", "Transcribe Time vs Audio Time": 0.1630740741, "Perfect": false },
      { "Model": "base", "Transcribe Time vs Audio Time": 0.227962963, "Perfect": false },
      { "Model": "small.en", "Transcribe Time vs Audio Time": 0.4248888889, "Perfect": true },
      { "Model": "small", "Transcribe Time vs Audio Time": 0.5275925926, "Perfect": true },
      { "Model": "medium.en", "Transcribe Time vs Audio Time": 1.728296296, "Perfect": false },
      { "Model": "medium", "Transcribe Time vs Audio Time": 1.181074074, "Perfect": true },
      { "Model": "large", "Transcribe Time vs Audio Time": 2.648148148, "Perfect": true },
      { "Model": "large-v1", "Transcribe Time vs Audio Time": 2.619259259, "Perfect": true },
      { "Model": "large-v2", "Transcribe Time vs Audio Time": 2.654814815, "Perfect": true }
    ]
  },
  "mark": {
    "type": "bar"
  },
  "encoding": {
    "x": {
      "field": "Model",
      "type": "nominal",
      "sort": null
    },
    "y": {
      "field": "Transcribe Time vs Audio Time",
      "type": "quantitative"
    },
    "color": {
      "field": "Perfect",
      "type": "nominal",
      "scale": {
        "domain": [false, true],
        "range": ["red", "green"]
      }
    }
  }
}

CPU performs better on smaller models, and MPS performs better on larger models.

A value of 1 means the audio time is the same as the transcode time. A value of 2 means it takes 2 seconds to transcribe 1 second of audio.

@salamer
Copy link

salamer commented Apr 22, 2023

Any progress? Or does whisper have any other means of accelerating inferencing?

@hqucsx
Copy link

hqucsx commented Aug 2, 2023

@mukulpatnaik My device is M1 MacBook Pro, I got the same error with the latest version of whisper(v20230314), then I switch to v20230124, every thing works fine. (torch nightly version)

But, seems like mps is slower than cpu like @renderpci reported, for my task

  • cpu 3.26 s
  • mps 5.25 s
  • cpu+torch2 compile 3.31 s
  • mps+torch2 compile 4.94 s

🫠

great it worked for me

@KnechtNoobrecht
Copy link

I got it working too, but on an Intel machine (5600M, i9-9980HK) and it does not seem to be doing anything.
It is using 40% GPU and 10% CPU, but no progress. Not even the progress bar comes up.
Can anyone reproduce?

@anvart
Copy link

anvart commented Aug 26, 2023

I got it working too, but on an Intel machine (5600M, i9-9980HK) and it does not seem to be doing anything. It is using 40% GPU and 10% CPU, but no progress. Not even the progress bar comes up. Can anyone reproduce?

@KnechtNoobrecht mps is for Apple Silicon (M1/M2), please anyone correct me if I am wrong.

@KnechtNoobrecht
Copy link

@KnechtNoobrecht mps is for Apple Silicon (M1/M2), please anyone correct me if I am wrong.

https://developer.apple.com/metal/pytorch/
According to their own documentation, it is not Apple Silicon exclusive.

@anvart
Copy link

anvart commented Aug 27, 2023

@KnechtNoobrecht

True, can also run on AMD GPUs.

@renderpci
Copy link

Hi

PyTorch was broken again!

I have same error msg with #382 (comment)

Traceback (most recent call last):
  File "/Users/render/Library/Python/3.9/bin/whisper", line 8, in <module>
    sys.exit(cli())
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/whisper/transcribe.py", line 444, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/whisper/__init__.py", line 154, in load_model
    return model.to(device)
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 1161, in to
    return self._apply(convert)
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 858, in _apply
    self._buffers[key] = fn(buf)
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31188 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:27199 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26838 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:944 [kernel]
QuantizedMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedMeta.cpp:105 [kernel]
MkldnnCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:515 [kernel]
SparseCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1387 [kernel]
SparseMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:249 [kernel]
SparseCsrCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1135 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:807 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:302 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:17268 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:379 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:245 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:744 [backend fallback]
BatchedNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:772 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:157 [backend fallback]


@mstephenson6
Copy link

I have large-v2 running on M1 Pro GPU (2021 MBP) today, big thank you to @linroex above for the pip freeze output. Starting from that and working through pip problems, I got to the below requirements.txt, installed in a fresh python 3.11 conda environment.

certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
Pillow==9.4.0
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch
torchaudio
torchvision
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.4.0
urllib3==1.26.14

To watch the transcribe output live as it's inferred, I added a sys.stderr.flush() line at lib/python3.11/site-packages/whisper/transcribe.py:175

@chicman
Copy link

chicman commented Nov 5, 2023

I tried to have the env setup but still got errors. M1 Pro MPS. macOS 14.1.

Traceback (most recent call last):
  File "/Users/jm/miniconda3/bin/whisper", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/whisper/transcribe.py", line 310, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/whisper/__init__.py", line 115, in load_model
    checkpoint = torch.load(fp, map_location=device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1024, in load
    return _load(opened_zipfile,
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1432, in _load
    result = unpickler.load()
             ^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1402, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1376, in load_tensor
    wrap_storage=restore_location(storage, location),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1306, in restore_location
    return default_restore_location(storage, map_location)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 394, in default_restore_location
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.storage.UntypedStorage (tagged with MPS)
certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
Pillow==9.4.0
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch
torchaudio
torchvision
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.4.0
urllib3==1.26.14

@salamer
Copy link

salamer commented Nov 28, 2023

any progress?

@kingname
Copy link

kingname commented Dec 2, 2023

$ whisper pie-ep91.mp3 --model small --output_format txt --device mps
Traceback (most recent call last):
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/bin/whisper", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/whisper/transcribe.py", line 458, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/whisper/__init__.py", line 156, in load_model
    return model.to(device)
           ^^^^^^^^^^^^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 849, in _apply
    self._buffers[key] = fn(buf)
                         ^^^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31357 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:27248 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26984 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:944 [kernel]
QuantizedMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedMeta.cpp:105 [kernel]
MkldnnCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:515 [kernel]
SparseCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1387 [kernel]
SparseMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:249 [kernel]
SparseCsrCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1135 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:807 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:154 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:324 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:17346 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:378 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:244 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:720 [backend fallback]
BatchedNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:746 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:162 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:166 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:158 [backend fallback]

@juanluisrto
Copy link

I am using whisper via huggingface pipelines, where you can specify to use MPS.
However after some tests I see that CPU only is still faster.

My guess is that not all pytorch operations are compatible with MPS yet as it can be seen in this issue: pytorch/pytorch#77764

For a 11 second audio clip it takes 0.81 s on CPU and 1.23 s on GPU

This is how I compare both approaches:

import gradio as gr
from transformers import pipeline
import numpy as np

import time


transcriber_gpu = pipeline("automatic-speech-recognition", model="openai/whisper-base", device = "mps")
transcriber_cpu = pipeline("automatic-speech-recognition", model="openai/whisper-base", device = "cpu")

def track_time(func, *args, **kwargs):
    start = time.time()
    output = func(*args, **kwargs)
    end = time.time()
    return output, end - start


def transcribe(audio):
    sr, y = audio
    y = y.astype(np.float32)
    if y.ndim == 2:  # Check if there are two channels
        y = np.mean(y, axis=1)  # Convert to mono by taking the mean of the two channels
    y /= np.max(np.abs(y))

    out_gpu = track_time(transcriber, {"sampling_rate": sr, "raw": y})
    out_cpu = track_time(transcriber_cpu, {"sampling_rate": sr, "raw": y})

    print(out_gpu)
    print(out_cpu)
    text_gpu = out_gpu[0]["text"]
    text_cpu = out_cpu[0]["text"]
    time_gpu = out_gpu[1]
    time_cpu = out_cpu[1]

    combined_output = f"""
    OUTPUT_GPU t={time_gpu}
    {text_gpu}

    OUTPUT_CPU t={time_cpu}
    {text_cpu}
        
    """
    
    return combined_output


demo = gr.Interface(
    transcribe,
    gr.Audio(),
    "text",
)

demo.launch()

@0x0elliot
Copy link

Any progress?

int8")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 145, in __init__
    self.model = ctranslate2.models.Whisper(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: unsupported device mps

@Francyrad
Copy link

Any news? the error is still present

@sagatake
Copy link

sagatake commented Nov 17, 2024

Hi, just for your information, you can run whisper with the almost identical way by replacing with transformers.
I've confirmed that works for my macbook pro with Apple sillicon.
https://huggingface.co/openai/whisper-large-v3

@andrewguy9
Copy link

Hi, just for your information, you can run whisper with the almost identical way by replacing with transformers. I've confirmed that works for my macbook pro with Apple sillicon. https://huggingface.co/openai/whisper-large-v3

@sagatake, would you mind pasting a small example? I'd like to verify mps is working.

@sagatake
Copy link

Hi, just for your information, you can run whisper with the almost identical way by replacing with transformers. I've confirmed that works for my macbook pro with Apple sillicon. https://huggingface.co/openai/whisper-large-v3

@sagatake, would you mind pasting a small example? I'd like to verify mps is working.

@andrewguy9

Here is the minimum example.

import torch

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset

def main():
    
    test_audio_path = r"test.wav"

    # device = "cuda:0" if torch.cuda.is_available() else "cpu"
    device = "mps" if torch.backends.mps.is_available() else "cpu"    

    torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
    
    model_id = "openai/whisper-large-v3"
    
    model = AutoModelForSpeechSeq2Seq.from_pretrained(
        model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
    )
    model.to(device)
    
    processor = AutoProcessor.from_pretrained(model_id)
    
    pipe = pipeline(
        "automatic-speech-recognition",
        model=model,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        torch_dtype=torch_dtype,
        device=device,
    )
    
    result = pipe(test_audio_path)
    print(result["text"])
        
if __name__ == '__main__':
    main()

@SamuelAierizer
Copy link

It would be really cool if something like this would work:

whisper <filename> --language <language> --model large --output_format txt --device mps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.