Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses MPS (Mac acceleration) by default when available #382

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dwarkeshsp
Copy link

Currently, Whisper defaults to using the CPU on MacOS devices despite the fact that PyTorch has introduced Metal Performance Shaders framework for Apple devices in the nightly release (more info).

With my changes to init.py, torch checks in MPS is available if torch.device has not been specified. If it is, and CUDA is not available, then Whisper defaults to MPS.

This way, Mac users can experience speedups from their GPU by default.

@usergit
Copy link

usergit commented Oct 21, 2022

@dwarkeshsp have you measured any speedups compared to using the CPU?

@Michcioperz
Copy link

Doesn't this also require switching FP16 off?

@DiegoGiovany
Copy link

DiegoGiovany commented Nov 9, 2022

I'm getting this error when try to use MPS

/Users/diego/.pyenv/versions/3.10.6/lib/python3.10/site-packages/whisper-1.0-py3.10.egg/whisper/decoding.py:629: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/diego/Projects/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
audio_features = audio_features.repeat_interleave(self.n_group, dim=0)
/AppleInternal/Library/BuildRoots/2d9b4df9-4b93-11ed-b0fc-2e32217d8374/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:794: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 23200 bytes
'
Abort trap: 6
/Users/diego/.pyenv/versions/3.10.6/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

any clues?

@glangford
Copy link

@DiegoGiovany Not an expert on this but It looks like PyTorch itself is missing some operators for MPS. See for example
pytorch/pytorch#77764 (comment)
(which refers to repeat_interleave)

and
pytorch/pytorch#87219

@gltanaka
Copy link

gltanaka commented Nov 17, 2022

Thanks for your work. I just tried this. Unfortunately, it didn't work for me on my m1 max with 32GB.
Here is what I did:
pip install git+https://github.com/openai/whisper.git@refs/pull/382/head

No errors on install and it works fine when run without mps: whisper audiofile_name --model medium

When I run: whisper audiofile_name --model medium --device mps

Here is the error I get:
Detecting language using up to the first 30 seconds. Use --language to specify the language
loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1024x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).

When I run: whisper audiofile_name --model medium --device mps --fp16 False

Here is the error I get:
Detecting language using up to the first 30 seconds. Use --language to specify the language
Detected language: English
/anaconda3/lib/python3.9/site-packages/whisper/decoding.py:633: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
audio_features = audio_features.repeat_interleave(self.n_group, dim=0)
/AppleInternal/Library/BuildRoots/f0468ab4-4115-11ed-8edc-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:794: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 1007280 bytes

Basically, same error as @DiegoGiovany.

Any ideas on how to fix?

@megeek
Copy link

megeek commented Nov 28, 2022

+1 for me! I'm actually using an Intel Mac with Radeon Pro 560X 4 GB...

@glangford
Copy link

@PhDLuffy
Copy link

PhDLuffy commented Dec 8, 2022

@dwarkeshsp

not work,with mbp2015 pytorch 1.3 stable,egpu RX580, MacOS 12.3.

changed the code as the same as yours.

changed to use --device mps but show error, maybe there is still somewhere to change or modify.

use --device cpu, it works.

with other pytorch-metal project, MPS works.

@changeling
Copy link

What's the status on this?

@jongwook
Copy link
Collaborator

I also see the same errors as others mentioned above, on an M1 Mac running arm64 Python.

@changeling
Copy link

changeling commented Jan 19, 2023

On an M1 16" MBP with 16GB running MacOS 13.0.1, I'm seeing the following with openai-whisper-20230117:

Using this command:
(venv) whisper_ai_playground % whisper './test_file.mp3' --model tiny.en --output_dir ./output --device mps

I'm encountering the following errors:

loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x384x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible

LLVM ERROR: Failed to infer result type(s).

zsh: abort whisper --model tiny.en --output_dir ./output --device mps

  warnings.warn('resource_tracker: There appear to be %d '```
@sachit-menon
Copy link

Is there any update on this, or did anyone figure out how to get it to work?

@renderpci
Copy link

renderpci commented Feb 5, 2023

Same problem with osx 13.2 in MacBook Pro M2 max:

loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort      whisper audio.wav --language en --model large
m2@Render ~ % /opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
@DontEatOreo
Copy link

I'm getting the same error as @renderpci using the M1 Base Model

loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x512x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
[1]    3746 abort      python3 test.py

test.py:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
@saurabhsharan
Copy link

saurabhsharan commented Feb 6, 2023

FWIW I switched to the C++ port https://github.com/ggerganov/whisper.cpp/ and got a ~15x speedup compared to CPU pytorch on my M1 Pro. (But note that it doesn't have all the features/flags from the official whisper repo.)

@renderpci
Copy link

renderpci commented Feb 6, 2023

FWIW I switched to the C++ port https://github.com/ggerganov/whisper.cpp/

For us whisper.cpp is not an option:

Should I use whisper.cpp in my project?

whisper.cpp is a hobby project. It does not strive to provide a production ready implementation. The main goals of the implementation is to be educational, minimalistic, portable, hackable and performant. There are no guarantees that the implementation is correct and bug-free and stuff can break at any point in the future. Support and updates will depend mostly on contributions, since with time I will move on and won't dedicate too much time on the project.

If you plan to use whisper.cpp in your own project, keep in mind the above.
My advice is to not put all your eggs into the whisper.cpp basket.

@devpacdd
Copy link

devpacdd commented Feb 7, 2023

The same error as @renderpci using the M2

whisper interview.mp4 --language en --model large --device mps

loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x1280x3000xf16>' and 'tensor<1xf32>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort      whisper interview.mp4 --language en --model large --device mps
pac@dd ~ % /opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
@DenisVieriu97
Copy link

DenisVieriu97 commented Feb 21, 2023

Hey @devpacdd - this should be fixed in latest pytorch nightly (pip3 install --pre --force-reinstall torch --index-url https://download.pytorch.org/whl/nightly/cpu). Let me know if you still see any issues. Thanks

@manuthebyte
Copy link

manuthebyte commented Feb 21, 2023

Still have the same error after updating

Edit: After adding --fp16 False to the command, I now get a new error, as well as the old one:

/opt/homebrew/lib/python3.10/site-packages/whisper/decoding.py:633: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  audio_features = audio_features.repeat_interleave(self.n_group, dim=0)
/AppleInternal/Library/BuildRoots/5b8a32f9-5db2-11ed-8aeb-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:794: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 1007280 bytes
'
zsh: abort      whisper --model large --language de --task transcribe  --device mps --fp16
/opt/homebrew/Cellar/python@3.10/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
@cameronbergh
Copy link

i was able to get it to kinda work: davabase/whisper_real_time#5 (comment)

@DenisVieriu97
Copy link

The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
audio_features = audio_features.repeat_interleave(self.n_group, dim=0)

@manuthebyte could you please make sure you are on a recent nightly? repeat_interleave should be natively supported. If you could try grabbing today's nightly and give a try that would be awesome! (You can get today's nightly with pip3 install --pre --force-reinstall torch==2.0.0.dev20230224 --index-url https://download.pytorch.org/whl/nightly/cpu)

@cameronbergh
Copy link

cameronbergh commented Feb 25, 2023

Wow!

when running:
Python3 transcribe_demo.py --model medium (from https://github.com/davabase/whisper_real_time)

with the following packages in my pipenv's requirements.txt

certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
Pillow==9.4.0
PyAudio==0.2.13
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch==2.0.0.dev20230224
torchaudio==0.13.1
torchvision==0.14.1
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.4.0
urllib3==1.26.14

it gets every word! while i was singing! in realtime, with maybe 50%~ gpu usage on the apple M2 Pro Max.

@oyarsa
Copy link

oyarsa commented Mar 20, 2023

@HFrost0, yeah, you're in the clear. Maybe this is a problem with some operations the model uses? Let's hope it gets better over time.

@linroex
Copy link

linroex commented Mar 21, 2023

I have same error msg with #382 (comment)

Traceback (most recent call last):
  File "/Users/linroex/Projects/linroex/whisperAI/venv/bin/whisper", line 8, in <module>
    sys.exit(cli())
  File "/Users/linroex/Projects/linroex/whisperAI/venv/lib/python3.9/site-packages/whisper/transcribe.py", line 433, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
  File "/Users/linroex/Projects/linroex/whisperAI/venv/lib/python3.9/site-packages/whisper/__init__.py", line 154, in load_model
    return model.to(device)
  File "/Users/linroex/Projects/linroex/whisperAI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/Users/linroex/Projects/linroex/whisperAI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 844, in _apply
    self._buffers[key] = fn(buf)
  File "/Users/linroex/Projects/linroex/whisperAI/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31074 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:24065 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26824 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:936 [kernel]
QuantizedMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedMeta.cpp:105 [kernel]
MkldnnCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:507 [kernel]
SparseCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1379 [kernel]
SparseMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:249 [kernel]
SparseCsrCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1128 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:734 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:290 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17944 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:16786 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:506 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:373 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]

my pip freeze is:

certifi==2022.12.7
charset-normalizer==3.1.0
ffmpeg-python==0.2.0
filelock==3.10.0
future==0.18.3
idna==3.4
Jinja2==3.1.2
llvmlite==0.39.1
MarkupSafe==2.1.2
more-itertools==9.1.0
mpmath==1.3.0
networkx==3.0
numba==0.56.4
numpy==1.23.5
openai-whisper==20230314
Pillow==9.3.0
regex==2022.10.31
requests==2.28.2
sympy==1.11.1
tiktoken==0.3.1
torch==2.1.0.dev20230321
torchaudio==2.1.0.dev20230321
torchvision==0.16.0.dev20230321
tqdm==4.65.0
typing_extensions==4.5.0
urllib3==1.26.15

and python version 3.9.16, macOS 13.2.1, M1 Pro

my script:

whisper 14-20-50.mp3  --language zh --task transcribe --output_format txt --model medium --device mps --fp16 False
@NebulusIO
Copy link

Error persists on latest nightly 2.1.0.dev20230323

Traceback (most recent call last):
  File "/Users/tryk/.pyenv/versions/3.10.9/bin/whisper", line 8, in <module>
    sys.exit(cli())
  File "/Users/tryk/.pyenv/versions/3.10.9/lib/python3.10/site-packages/whisper/transcribe.py", line 433, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
  File "/Users/tryk/.pyenv/versions/3.10.9/lib/python3.10/site-packages/whisper/__init__.py", line 154, in load_model
    return model.to(device)
  File "/Users/tryk/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/Users/tryk/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/nn/modules/module.py", line 844, in _apply
    self._buffers[key] = fn(buf)
  File "/Users/tryk/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].
Additional Logs
CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31074 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:24065 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26824 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:936 [kernel]
QuantizedMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedMeta.cpp:105 [kernel]
MkldnnCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:507 [kernel]
SparseCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1379 [kernel]
SparseMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:249 [kernel]
SparseCsrCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1128 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:734 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:290 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:17946 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:16831 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:506 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:373 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]
@ururk
Copy link

ururk commented Mar 24, 2023

It looks like it is related to an issue in pytorch: pytorch/pytorch#87886

I presume it's not anything the whisper authors can fix.

@ururk
Copy link

ururk commented Mar 25, 2023

I did a bit of digging into this as I was curious. Forgive me as I am not familiar at all with torch. This error is happening in the pytorch library at this point:

return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

It appears a bunch of tensors are being converted, and when it encounters a sparse_coo tensor, it cannot handle it as aten::empty.memory_format is not implemented for SparseMPS (per the error above). I was wondering if this was the only stumbling block, so I patched module.py by adding:

if t.is_sparse == True and device.type == "mps":
    return t.to('cpu', dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)

whisper knit.wav --model base --device mps

And it runs. I have an M1 10-core, 16GB RAM.

When using CPU, it uses all 10 cores, no evident GPU activity
When using --device mps the GPU is nearly 100% and it seems like some CPU is being expended

I get one error:

/opt/homebrew/lib/python3.9/site-packages/whisper/decoding.py:720: UserWarning: MPS: no support for int64 repeats mask, casting it to int32. Support has been added in macOS 13.3 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:225.)
  audio_features = audio_features.repeat_interleave(self.n_group, dim=0)

That seems pretty self-evident, not sure what impact this would have. 13.3 is rumored to be released next week.

It outputs valid transcription files. It takes twice as long as CPU. I'm not sure if - for some operations - torch has to use the CPU because that one tensor was converted using the CPU or if the rest of its operations will proceed on GPU per activity monitor.

GPU usage when --device mps:
image

@voidfel
Copy link

voidfel commented Mar 27, 2023

Until this is optimized for Apple Silicon M1/M2 Pro/Max CPUs/GPUs, we can just run it on Google Colaboratory.

@xros
Copy link

xros commented Apr 9, 2023

@mukulpatnaik My device is M1 MacBook Pro, I got the same error with the latest version of whisper(v20230314), then I switch to v20230124, every thing works fine. (torch nightly version)

But, seems like mps is slower than cpu like @renderpci reported, for my task

  • cpu 3.26 s
  • mps 5.25 s
  • cpu+torch2 compile 3.31 s
  • mps+torch2 compile 4.94 s

🫠

I tried to downgrade to version v20230124 from v20230314. The error was gone but actually it returned no result.

m1 mac. pytorch 2.1.0 nightly build

@xros
Copy link

xros commented Apr 9, 2023

Wow!

when running: Python3 transcribe_demo.py --model medium (from https://github.com/davabase/whisper_real_time)

with the following packages in my pipenv's requirements.txt

certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
Pillow==9.4.0
PyAudio==0.2.13
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch==2.0.0.dev20230224
torchaudio==0.13.1
torchvision==0.14.1
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.4.0
urllib3==1.26.14

it gets every word! while i was singing! in realtime, with maybe 50%~ gpu usage on the apple M2 Pro Max.

Finally it works with MPS.

For anyone who doesn't know how to do it.

do this on m1 / m2 mac

pip3 uninstall openai-whisper

pip3 install git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860

pip3 install --pre --force-reinstall torch --index-url https://download.pytorch.org/whl/nightly/cpu

But in codes, it returns nothing if you want to do some low-level operations such as https://github.com/openai/whisper#python-usage.

It works with command line and model.transcribe("voice.mp3") though.

result = whisper.decode(model, mel, options)
print(result.text)

Hopefully in the later release, whisper will support MPS very well.

@andrewguy9
Copy link

Did some performance testing of MPS vs CPU on Apple M2 Pro.

I tested a 30 second clip for performance and accuracy on every version of the model and CPU vs MPS.

Here is the MPS Version:
m2pro_mps

details
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": [
      { "Model": "tiny.en", "Transcribe Time vs Audio Time": 0.214962963, "Perfect": false },
      { "Model": "tiny", "Transcribe Time vs Audio Time": 0.267037037, "Perfect": false },
      { "Model": "base.en", "Transcribe Time vs Audio Time": 0.2654814815, "Perfect": false },
      { "Model": "base", "Transcribe Time vs Audio Time": 0.3830740741, "Perfect": false },
      { "Model": "small.en", "Transcribe Time vs Audio Time": 0.6409259259, "Perfect": true },
      { "Model": "small", "Transcribe Time vs Audio Time": 0.7203333333, "Perfect": true },
      { "Model": "medium.en", "Transcribe Time vs Audio Time": 2.406666667, "Perfect": false },
      { "Model": "medium", "Transcribe Time vs Audio Time": 1.545925926, "Perfect": true },
      { "Model": "large", "Transcribe Time vs Audio Time": 3.214814815, "Perfect": true },
      { "Model": "large-v1", "Transcribe Time vs Audio Time": 3.090740741, "Perfect": true },
      { "Model": "large-v2", "Transcribe Time vs Audio Time": 3.181481481, "Perfect": true }
    ]
  },
  "mark": {
    "type": "bar",
    "color": {
      "condition": {
        "test": "datum.Perfect === true",
        "value": "green"
      },
      "value": "red"
    }
  },
  "encoding": {
    "x": {
      "field": "Model",
      "type": "nominal",
      "sort": null
    },
    "y": {
      "field": "Transcribe Time vs Audio Time",
      "type": "quantitative"
    }
  }
}

Here is the CPU Version:
m2pro_cpu

details
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": [
      { "Model": "tiny.en", "Transcribe Time vs Audio Time": 0.101037037, "Perfect": false },
      { "Model": "tiny", "Transcribe Time vs Audio Time": 0.1236296296, "Perfect": false },
      { "Model": "base.en", "Transcribe Time vs Audio Time": 0.1630740741, "Perfect": false },
      { "Model": "base", "Transcribe Time vs Audio Time": 0.227962963, "Perfect": false },
      { "Model": "small.en", "Transcribe Time vs Audio Time": 0.4248888889, "Perfect": true },
      { "Model": "small", "Transcribe Time vs Audio Time": 0.5275925926, "Perfect": true },
      { "Model": "medium.en", "Transcribe Time vs Audio Time": 1.728296296, "Perfect": false },
      { "Model": "medium", "Transcribe Time vs Audio Time": 1.181074074, "Perfect": true },
      { "Model": "large", "Transcribe Time vs Audio Time": 2.648148148, "Perfect": true },
      { "Model": "large-v1", "Transcribe Time vs Audio Time": 2.619259259, "Perfect": true },
      { "Model": "large-v2", "Transcribe Time vs Audio Time": 2.654814815, "Perfect": true }
    ]
  },
  "mark": {
    "type": "bar"
  },
  "encoding": {
    "x": {
      "field": "Model",
      "type": "nominal",
      "sort": null
    },
    "y": {
      "field": "Transcribe Time vs Audio Time",
      "type": "quantitative"
    },
    "color": {
      "field": "Perfect",
      "type": "nominal",
      "scale": {
        "domain": [false, true],
        "range": ["red", "green"]
      }
    }
  }
}

CPU performs better on smaller models, and MPS performs better on larger models.

A value of 1 means the audio time is the same as the transcode time. A value of 2 means it takes 2 seconds to transcribe 1 second of audio.

@salamer
Copy link

salamer commented Apr 22, 2023

Any progress? Or does whisper have any other means of accelerating inferencing?

@hqucsx
Copy link

hqucsx commented Aug 2, 2023

@mukulpatnaik My device is M1 MacBook Pro, I got the same error with the latest version of whisper(v20230314), then I switch to v20230124, every thing works fine. (torch nightly version)

But, seems like mps is slower than cpu like @renderpci reported, for my task

  • cpu 3.26 s
  • mps 5.25 s
  • cpu+torch2 compile 3.31 s
  • mps+torch2 compile 4.94 s

🫠

great it worked for me

@KnechtNoobrecht
Copy link

I got it working too, but on an Intel machine (5600M, i9-9980HK) and it does not seem to be doing anything.
It is using 40% GPU and 10% CPU, but no progress. Not even the progress bar comes up.
Can anyone reproduce?

@anvart
Copy link

anvart commented Aug 26, 2023

I got it working too, but on an Intel machine (5600M, i9-9980HK) and it does not seem to be doing anything. It is using 40% GPU and 10% CPU, but no progress. Not even the progress bar comes up. Can anyone reproduce?

@KnechtNoobrecht mps is for Apple Silicon (M1/M2), please anyone correct me if I am wrong.

@KnechtNoobrecht
Copy link

@KnechtNoobrecht mps is for Apple Silicon (M1/M2), please anyone correct me if I am wrong.

https://developer.apple.com/metal/pytorch/
According to their own documentation, it is not Apple Silicon exclusive.

@anvart
Copy link

anvart commented Aug 27, 2023

@KnechtNoobrecht

True, can also run on AMD GPUs.

@renderpci
Copy link

Hi

PyTorch was broken again!

I have same error msg with #382 (comment)

Traceback (most recent call last):
  File "/Users/render/Library/Python/3.9/bin/whisper", line 8, in <module>
    sys.exit(cli())
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/whisper/transcribe.py", line 444, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/whisper/__init__.py", line 154, in load_model
    return model.to(device)
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 1161, in to
    return self._apply(convert)
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 858, in _apply
    self._buffers[key] = fn(buf)
  File "/Users/render/Library/Python/3.9/lib/python/site-packages/torch/nn/modules/module.py", line 1159, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31188 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:27199 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26838 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:944 [kernel]
QuantizedMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedMeta.cpp:105 [kernel]
MkldnnCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:515 [kernel]
SparseCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1387 [kernel]
SparseMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:249 [kernel]
SparseCsrCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1135 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:807 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:302 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:18627 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:17268 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:379 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:245 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:744 [backend fallback]
BatchedNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:772 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:157 [backend fallback]


@mstephenson6
Copy link

I have large-v2 running on M1 Pro GPU (2021 MBP) today, big thank you to @linroex above for the pip freeze output. Starting from that and working through pip problems, I got to the below requirements.txt, installed in a fresh python 3.11 conda environment.

certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
Pillow==9.4.0
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch
torchaudio
torchvision
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.4.0
urllib3==1.26.14

To watch the transcribe output live as it's inferred, I added a sys.stderr.flush() line at lib/python3.11/site-packages/whisper/transcribe.py:175

@chicman
Copy link

chicman commented Nov 5, 2023

I tried to have the env setup but still got errors. M1 Pro MPS. macOS 14.1.

Traceback (most recent call last):
  File "/Users/jm/miniconda3/bin/whisper", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/whisper/transcribe.py", line 310, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/whisper/__init__.py", line 115, in load_model
    checkpoint = torch.load(fp, map_location=device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1024, in load
    return _load(opened_zipfile,
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1432, in _load
    result = unpickler.load()
             ^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1402, in persistent_load
    typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1376, in load_tensor
    wrap_storage=restore_location(storage, location),
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 1306, in restore_location
    return default_restore_location(storage, map_location)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jm/miniconda3/lib/python3.11/site-packages/torch/serialization.py", line 394, in default_restore_location
    raise RuntimeError("don't know how to restore data location of "
RuntimeError: don't know how to restore data location of torch.storage.UntypedStorage (tagged with MPS)
certifi==2022.12.7
charset-normalizer==3.0.1
ffmpeg-python==0.2.0
filelock==3.9.0
future==0.18.3
huggingface-hub==0.12.1
idna==3.4
more-itertools==9.0.0
mpmath==1.2.1
networkx==3.0rc1
numpy==1.24.2
openai-whisper @ git+https://github.com/openai/whisper.git@51c785f7c91b8c032a1fa79c0e8f862dea81b860
packaging==23.0
Pillow==9.4.0
PyYAML==6.0
regex==2022.10.31
requests==2.28.2
SpeechRecognition==3.9.0
sympy==1.11.1
tokenizers==0.13.2
torch
torchaudio
torchvision
tqdm==4.64.1
transformers==4.26.1
typing_extensions==4.4.0
urllib3==1.26.14
@salamer
Copy link

salamer commented Nov 28, 2023

any progress?

@kingname
Copy link

kingname commented Dec 2, 2023

$ whisper pie-ep91.mp3 --model small --output_format txt --device mps
Traceback (most recent call last):
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/bin/whisper", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/whisper/transcribe.py", line 458, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/whisper/__init__.py", line 156, in load_model
    return model.to(device)
           ^^^^^^^^^^^^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 849, in _apply
    self._buffers[key] = fn(buf)
                         ^^^^^^^
  File "/Users/kingname/.local/share/virtualenvs/smart_podcast-NYiabyPE/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'SparseMPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, MPS, Meta, QuantizedCPU, QuantizedMeta, MkldnnCPU, SparseCPU, SparseMeta, SparseCsrCPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, BatchedNestedTensor, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31357 [kernel]
MPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMPS.cpp:27248 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:26984 [kernel]
QuantizedCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedCPU.cpp:944 [kernel]
QuantizedMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterQuantizedMeta.cpp:105 [kernel]
MkldnnCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMkldnnCPU.cpp:515 [kernel]
SparseCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCPU.cpp:1387 [kernel]
SparseMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseMeta.cpp:249 [kernel]
SparseCsrCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterSparseCsrCPU.cpp:1135 [kernel]
BackendSelect: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterBackendSelect.cpp:807 [kernel]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:154 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:324 [backend fallback]
Named: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:21 [kernel]
Negative: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:23 [kernel]
ZeroTensor: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:90 [kernel]
ADInplaceOrView: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradHIP: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradVE: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMTIA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradMeta: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
AutogradNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:19039 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:17346 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:378 [backend fallback]
AutocastCUDA: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:244 [backend fallback]
FuncTorchBatched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:720 [backend fallback]
BatchedNestedTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:746 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:162 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:166 [backend fallback]
PythonDispatcher: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:158 [backend fallback]
@juanluisrto
Copy link

I am using whisper via huggingface pipelines, where you can specify to use MPS.
However after some tests I see that CPU only is still faster.

My guess is that not all pytorch operations are compatible with MPS yet as it can be seen in this issue: pytorch/pytorch#77764

For a 11 second audio clip it takes 0.81 s on CPU and 1.23 s on GPU

This is how I compare both approaches:

import gradio as gr
from transformers import pipeline
import numpy as np

import time


transcriber_gpu = pipeline("automatic-speech-recognition", model="openai/whisper-base", device = "mps")
transcriber_cpu = pipeline("automatic-speech-recognition", model="openai/whisper-base", device = "cpu")

def track_time(func, *args, **kwargs):
    start = time.time()
    output = func(*args, **kwargs)
    end = time.time()
    return output, end - start


def transcribe(audio):
    sr, y = audio
    y = y.astype(np.float32)
    if y.ndim == 2:  # Check if there are two channels
        y = np.mean(y, axis=1)  # Convert to mono by taking the mean of the two channels
    y /= np.max(np.abs(y))

    out_gpu = track_time(transcriber, {"sampling_rate": sr, "raw": y})
    out_cpu = track_time(transcriber_cpu, {"sampling_rate": sr, "raw": y})

    print(out_gpu)
    print(out_cpu)
    text_gpu = out_gpu[0]["text"]
    text_cpu = out_cpu[0]["text"]
    time_gpu = out_gpu[1]
    time_cpu = out_cpu[1]

    combined_output = f"""
    OUTPUT_GPU t={time_gpu}
    {text_gpu}

    OUTPUT_CPU t={time_cpu}
    {text_cpu}
        
    """
    
    return combined_output


demo = gr.Interface(
    transcribe,
    gr.Audio(),
    "text",
)

demo.launch()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment