-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Uses MPS (Mac acceleration) by default when available #382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@dwarkeshsp have you measured any speedups compared to using the CPU? |
Doesn't this also require switching FP16 off? |
I'm getting this error when try to use MPS /Users/diego/.pyenv/versions/3.10.6/lib/python3.10/site-packages/whisper-1.0-py3.10.egg/whisper/decoding.py:629: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/diego/Projects/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.) any clues? |
@DiegoGiovany Not an expert on this but It looks like PyTorch itself is missing some operators for MPS. See for example |
Thanks for your work. I just tried this. Unfortunately, it didn't work for me on my m1 max with 32GB. No errors on install and it works fine when run without mps: whisper audiofile_name --model medium When I run: whisper audiofile_name --model medium --device mps Here is the error I get: When I run: whisper audiofile_name --model medium --device mps --fp16 False Here is the error I get: Basically, same error as @DiegoGiovany. Any ideas on how to fix? |
+1 for me! I'm actually using an Intel Mac with Radeon Pro 560X 4 GB... |
Related |
not work,with mbp2015 pytorch 1.3 stable,egpu RX580, MacOS 12.3. changed the code as the same as yours. changed to use --device mps but show error, maybe there is still somewhere to change or modify. use --device cpu, it works. with other pytorch-metal project, MPS works. |
What's the status on this? |
I also see the same errors as others mentioned above, on an M1 Mac running arm64 Python. |
On an M1 16" MBP with 16GB running MacOS 13.0.1, I'm seeing the following with Using this command: I'm encountering the following errors:
|
Is there any update on this, or did anyone figure out how to get it to work? |
Same problem with osx 13.2 in MacBook Pro M2 max:
|
I'm getting the same error as @renderpci using the M1 Base Model loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x512x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1] 3746 abort python3 test.py test.py: import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"]) |
FWIW I switched to the C++ port https://github.com/ggerganov/whisper.cpp/ and got a ~15x speedup compared to CPU pytorch on my M1 Pro. (But note that it doesn't have all the features/flags from the official whisper repo.) |
For us whisper.cpp is not an option:
|
The same error as @renderpci using the M2 whisper interview.mp4 --language en --model large --device mps
|
Hey @devpacdd - this should be fixed in latest pytorch nightly (pip3 install --pre --force-reinstall torch --index-url https://download.pytorch.org/whl/nightly/cpu). Let me know if you still see any issues. Thanks |
Still have the same error after updating Edit: After adding
|
i was able to get it to kinda work: davabase/whisper_real_time#5 (comment) |
@manuthebyte could you please make sure you are on a recent nightly? |
Wow! when running: with the following packages in my pipenv's requirements.txt
it gets every word! while i was singing! in realtime, with maybe 50%~ gpu usage on the apple M2 Pro Max. |
Did some performance testing of MPS vs CPU on Apple M2 Pro. I tested a 30 second clip for performance and accuracy on every version of the model and CPU vs MPS. details
details
CPU performs better on smaller models, and MPS performs better on larger models. A value of 1 means the audio time is the same as the transcode time. A value of 2 means it takes 2 seconds to transcribe 1 second of audio. |
Any progress? Or does whisper have any other means of accelerating inferencing? |
great it worked for me |
I got it working too, but on an Intel machine (5600M, i9-9980HK) and it does not seem to be doing anything. |
@KnechtNoobrecht |
https://developer.apple.com/metal/pytorch/ |
True, can also run on AMD GPUs. |
Hi PyTorch was broken again! I have same error msg with #382 (comment)
|
I have
To watch the transcribe output live as it's inferred, I added a |
I tried to have the env setup but still got errors. M1 Pro MPS. macOS 14.1.
|
any progress? |
$ whisper pie-ep91.mp3 --model small --output_format txt --device mps
Traceback (most recent call last):
File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/bin/whisper", line 8, in <module>
sys.exit(cli())
^^^^^
File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/whisper/transcribe.py", line 458, in cli
model = load_model(model_name, device=device, download_root=model_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/whisper/__init__.py", line 156, in load_model
return model.to(device)
^^^^^^^^^^^^^^^^
File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1152, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 849, in _apply
self._buffers[key] = fn(buf)
^^^^^^^
File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31357 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:27248 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26984 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:944 [kernel]
QuantizedMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedMeta.cpp:105 [kernel]
MkldnnCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:515 [kernel]
SparseCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1387 [kernel]
SparseMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:249 [kernel]
SparseCsrCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1135 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:807 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:154 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:324 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:17346 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:378 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:244 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:720 [backend fallback]
BatchedNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:746 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:162 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:166 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:158 [backend fallback] |
I am using whisper via huggingface pipelines, where you can specify to use MPS. My guess is that not all pytorch operations are compatible with MPS yet as it can be seen in this issue: pytorch/pytorch#77764 For a 11 second audio clip it takes 0.81 s on CPU and 1.23 s on GPU This is how I compare both approaches: import gradio as gr
from transformers import pipeline
import numpy as np
import time
transcriber_gpu = pipeline("automatic-speech-recognition", model="openai/whisper-base", device = "mps")
transcriber_cpu = pipeline("automatic-speech-recognition", model="openai/whisper-base", device = "cpu")
def track_time(func, *args, **kwargs):
start = time.time()
output = func(*args, **kwargs)
end = time.time()
return output, end - start
def transcribe(audio):
sr, y = audio
y = y.astype(np.float32)
if y.ndim == 2: # Check if there are two channels
y = np.mean(y, axis=1) # Convert to mono by taking the mean of the two channels
y /= np.max(np.abs(y))
out_gpu = track_time(transcriber, {"sampling_rate": sr, "raw": y})
out_cpu = track_time(transcriber_cpu, {"sampling_rate": sr, "raw": y})
print(out_gpu)
print(out_cpu)
text_gpu = out_gpu[0]["text"]
text_cpu = out_cpu[0]["text"]
time_gpu = out_gpu[1]
time_cpu = out_cpu[1]
combined_output = f"""
OUTPUT_GPU t={time_gpu}
{text_gpu}
OUTPUT_CPU t={time_cpu}
{text_cpu}
"""
return combined_output
demo = gr.Interface(
transcribe,
gr.Audio(),
"text",
)
demo.launch() |
Any progress? int8")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 145, in __init__
self.model = ctranslate2.models.Whisper(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: unsupported device mps |
Any news? the error is still present |
Hi, just for your information, you can run whisper with the almost identical way by replacing with transformers. |
@sagatake, would you mind pasting a small example? I'd like to verify mps is working. |
Here is the minimum example. import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
def main():
test_audio_path = r"test.wav"
# device = "cuda:0" if torch.cuda.is_available() else "cpu"
device = "mps" if torch.backends.mps.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "openai/whisper-large-v3"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
result = pipe(test_audio_path)
print(result["text"])
if __name__ == '__main__':
main() |
It would be really cool if something like this would work:
|
Apple M3 Max , successfully worked. https://rewa-insights.com/t/translating-multilingual-audio-into-simplified-chinese-and-saving-to-a-text-file-with-python/432?u=rewa-evija |
Currently, Whisper defaults to using the CPU on MacOS devices despite the fact that PyTorch has introduced Metal Performance Shaders framework for Apple devices in the nightly release (more info).
With my changes to init.py, torch checks in MPS is available if torch.device has not been specified. If it is, and CUDA is not available, then Whisper defaults to MPS.
This way, Mac users can experience speedups from their GPU by default.