General MPS op coverage tracking issue #77764

albanD · 2022-05-18T18:12:47Z

This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.

PyTorch MPS Ops Project : Project to track all the ops for MPS backend. There are a very large number of operators in pytorch and so they are not all yet implemented. We will be prioritizing adding new operators based on user feedback. If possible, please also provide link to the network or use-case where this op is getting used.

As Ops are requested we will add " To Triage" pool. If we have 3+ requests for an operation and given its complexity/need the operation will be moved "To be implemented" pool. If you want to work on adding support for such op, feel free to comment below to get assigned one. Please avoid pickup up an op that is already being worked on tracked in "In progress" pool.

Link to the wiki for details on how to add these ops and example PRs.

MPS operators coverage matrix - The matrix covers most of the supported operators but is not exhaustive. Please look at the In vx.x.x column, if the box is green, it means that the op implementation is included in the latest release; on the other hand, if the box is yellow, it means the op implementation is in the nightly and has not yet included in the latest release. Before you comment below, please take a look at this matrix to make sure the operator you're requesting has not been implemented in nightly. More details can be found on the readme.

cc @kulinseth @malfet @DenisVieriu97 @jhavukainen

The text was updated successfully, but these errors were encountered:

philipturner · 2022-05-18T19:34:00Z

Are there any linear algebra ops not implemented in MPS that you have made custom shaders for? Any shaders I could "borrow" from your project (with full credit) and use in my own? Specifically, it would be helpful to have SVD and reverse-mode Cholesky operators.

albanD · 2022-05-18T20:04:10Z

Hey,

There are no custom shaders at the moment as everything we needed for the basic networks we looked at was already provided by MPS (or a set of ops in MPS). Also , required functions that are not in the hot path are simply falling back to CPU for now.

It is mentioned here as this is something that is possible to be done easily within the integration. But not something that is used today.

pzelasko · 2022-05-18T21:42:24Z

I was testing a bunch of speech synthesis and vocoder models, and found the following operators missing so far:

aten::flip
aten::equal
aten::upsample_nearest1d.out

Linux-cpp-lisp · 2022-05-18T23:28:58Z

One vote for a CPU fallback for torch.bincount.

Is there any reason, given the unified memory architecture, that every op not implemented on Metal cannot just fall back to the CPU implementation without memory copy operations? (Based, of course, on my 10,000ft view of the architecture, which I'm sure is wildly oversimplified.)

richardburleigh · 2022-05-19T02:38:14Z

Tip for everyone:

Run your script with PYTORCH_ENABLE_MPS_FALLBACK=1 which will fallback to the CPU.

I'm using a custom build which merges pull request #77791 so am not sure if this is included in the current build (Edit: It's not. You need to build PyTorch yourself with the pull request or trust an online build with it).

gautierdag · 2022-05-19T08:43:29Z

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

lhoenig · 2022-05-20T11:23:16Z

One missing op I ran into and haven't seen mentioned yet is aten::_unique2.
Edit: This error goes away when passing PYTORCH_ENABLE_MPS_FALLBACK=1 when using the current main branch build. However, instead I get warnings

The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)

then

The dst MTL buffer in copy_to_mps is non-contiguous (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/native/mps/operations/Copy.mm:323.)

and finally the forward pass through my model crashes with

RuntimeError: Placeholder buffer size (7493632) is not large enough to contain the Tensor storage of size 14986944

On cpu it works fine. Could be #77886 I suppose.

Willian-Zhang · 2022-05-20T14:48:26Z

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

+1
setting PYTORCH_ENABLE_MPS_FALLBACK=1 still results in:

NotImplementedError: Could not run 'aten::cumsum.out' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::cumsum.out' is only available for these backends: [Dense, Conjugate, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37386 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:31637 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:3288 [kernel]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp:12585 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterFunctionalization_3.cpp:12118 [kernel]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]

albanD · 2022-05-20T15:25:47Z

@lhoenig could you open a new separate issue for the cpu fallback failing for you?
The error seems to hint at the fact that you're doing moving across device non-contiguous Tensor. Making sure they are might help as a workaround.
We can continue this discussion in the new issue you will create.

@Willian-Zhang the fallback is ONLY available if you build from source right now. It will be in the nightly build tomorrow (May 21st).

weiji14 · 2022-05-20T19:33:27Z

Would like to add aten::_local_scalar_dense to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.

lhoenig · 2022-05-20T20:17:12Z

@albanD Yep, making the Tensors contiguous worked. But yet another issue revealed itself. I created #77977 and #78001.

psobolewskiPhD · 2022-05-20T21:09:49Z

I've got a non supported op: aten::grid_sampler_2d

envs/pytorch-env/lib/python3.9/site-packages/torch/nn/functional.py:4172: UserWarning: The operator 'aten::grid_sampler_2d' is not currently supported on the MPS backend and will fall back to run on the CPU. This may performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)

thipokKub · 2022-05-21T04:22:19Z

Not supported

aten::l1_loss_backward.grad_input
aten::kl_div_backward

Code

X, y = torch.rand(16, 10).to("mps"), torch.rand(16, 1).to("mps")
model = nn.Linear(10, 1).to("mps")
criterion = nn.L1Loss() # nn.KLDivLoss()
loss = criterion(model(X), y)
loss.backward()

Output

NotImplementedError: Could not run 'aten::l1_loss_backward.grad_input' with arguments from the 'MPS' backend

tw-ilson · 2022-05-22T00:48:04Z

Trying to use affine crop from torchvision, and found the operator aten::linspace.out does not seem to be implemented with the MPS backend

nicolasbeglinger · 2022-05-22T15:40:03Z

Trying to use MPS backend with pytorch geometric, and found the operator aten::index.Tensor is not yet implemented.

feesta · 2022-05-22T22:37:15Z

Found the operator 'aten::grid_sampler_2d' is not current implemented for the MPS device.

mooey5775 · 2022-05-23T00:09:24Z

Would be great to add aten::adaptive_max_pool2d to the list - seems to be fairly common and for me useful in some point cloud architectures.

RohanM · 2022-05-23T03:13:15Z

I ran into this error with aten::count_nonzero.dim_IntList (via torch.count_nonzero()). I'll take a look at implementing this op with MPS.

thipyss · 2024-07-19T10:08:33Z

The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device.

pradeepsharma · 2024-07-20T05:05:33Z

Please prioritize "isin.Tensor_Tensor_out"

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

vonlaughing · 2024-07-20T11:48:52Z

Please prioritize aten::upsample_bicubic2d.out, the temporary fix doesn't work for me :(

thibaudbrg · 2024-07-21T13:03:26Z

The operator aten::_fft_r2c is not currently implemented for the MPS device 🙏

gblssroman · 2024-07-21T14:08:32Z

+1 for isin.Tensor_Tensor_out

robtaylor · 2024-07-21T14:10:26Z

So, really stupid question.. why do these functions need to be reimplemented on each accelerator architecture? Why isn't there a code generator/compiler for this?

TTonnyy789 · 2024-07-21T15:59:05Z

Please prioritize aten::_convert_indices_from_coo_to_csr.out 🙏

NotImplementedError: The operator aten::_convert_indices_from_coo_to_csr.out is not currently implemented for the MPS device.

nandinimundra · 2024-07-22T17:25:20Z

Hi, I want to use the Gemma-2 9B model for a generation task. I am running the code on an M3 Pro chip, but I am getting this error:

Complete error message : -

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Please let me know how to fix this.
Thanks in Advance.
Best,
Nandini

tcoroller · 2024-07-23T01:15:00Z

Hello, I ran into this issue running on mps.

The operator 'aten::_logcumsumexp' is not currently implemented for the MPS device.

We developed a deep survival library and we rely on the function below for our Cox model (see code here).

Would it be possible to add torch.logcumsumexp() to the priority list?

Thanks!

poldrack · 2024-07-24T12:52:34Z

here is a vote for aten::slow_conv3d_forward

qqaatw · 2024-07-24T23:48:54Z

Hello, I ran into this issue running on mps.

The operator 'aten::_logcumsumexp' is not currently implemented for the MPS device.

We developed a deep survival library and we rely on the function below for our Cox model (see code here).

Would it be possible to add torch.logcumsumexp() to the priority list?

Thanks!

done.

qqaatw · 2024-07-24T23:50:28Z

aten::slow_conv3d_forward

done

giovanniOfficioso · 2024-07-25T17:19:41Z

Vote for aten::linalg_lstsq.out

jdcpni · 2024-07-25T20:47:42Z

Please support 'aten::cumprod.out'... basic function used heavily in lots of statistical / ML apps.!

betatim · 2024-07-26T10:24:26Z

We use aten::_linalg_eigh.eigenvalues in scikit-learn's PCA implementation. So giving it a vote.

We noticed that some ops automatically fallback to CPU (e.g. aten::linalg_svd). Is there somewhere to read up on which ops automatically fallback and which don't/why some do and some don't?

aaronllowe · 2024-07-28T06:05:13Z

Vote for torchscript:nms, used in quite a few CV libraries such as super-gradients or ultralytics

capsenz · 2024-07-31T21:24:10Z

Voting for aten::upsample_bicubic2d.out to implemented for MPS

kstan79 · 2024-08-01T06:11:01Z

ya please support mps for nms

GaussianGuaicai · 2024-08-01T14:42:13Z

Voting for aten::upsample_bicubic2d.out to implemented for MPS

all-creator · 2024-08-01T22:44:28Z

+1 for torchvision::nms on mps

qqaatw · 2024-08-01T22:47:56Z

torchvision::nms is supported since last year.

FiReTiTi · 2024-08-03T09:21:26Z

torchvision::nms is supported since last year.

In which version?

I just ran into this error after updating the libraries: NotImplementedError: The operator 'torchvision::nms' is not currently implemented for the MPS device.

The last upgrade was: Successfully installed torch-2.2.2 torchaudio-2.2.2 torchvision-0.17.2

qqaatw · 2024-08-03T15:18:40Z

@FiReTiTi Since torchvision 0.16.

If it's still an issue, can you please open an issue in the torchvision repository and provide a minimal reproducer? Thank you.

kevinjohncutler · 2024-08-04T09:14:25Z

voting for aten::max_pool3d_with_indices for u-net models

cem-sirin · 2024-08-04T13:04:16Z

another vote for atten::upsample_bicubic2d.out from me

Akossimon · 2024-08-04T15:07:24Z

a new vote yet again for implementing MPS

.....The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764......

JBlitzar · 2024-08-04T15:31:50Z

Voting for aten::_linalg_eigvals

NotImplementedError: The operator 'aten::_linalg_eigvals' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

for torcheval, frenchetinceptiondistance

albanD added feature A request for a proper, new feature. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework labels May 18, 2022

albanD changed the title ~~General MPS op coverage issue~~ May 18, 2022

albanD mentioned this issue May 18, 2022

Some operation are not implemented when using mps backend #77754

Closed

albanD mentioned this issue May 18, 2022

NotImplementedError: Could not run 'aten::amax.out' with arguments from the 'MPS' backend. #77776

Closed

albanD mentioned this issue May 18, 2022

NotImplementedError: Could not run 'aten::index.Tensor' on MPS #77794

Closed

This was referenced May 19, 2022

Could not run 'aten::amax.out' with arguments from the 'MPS' backend. #77817

Closed

NotImplementedError: Could not run 'aten::eye.m_out' on MPS #77797

Closed

albanD mentioned this issue May 19, 2022

torch.nn.Conv3D on MPS backend #77818

Closed

weiji14 mentioned this issue May 20, 2022

⚡ DeepSpeed ZeRO Stage 2 model parallel training weiji14/s2s2net#2

Merged

4 tasks

ducha-aiki mentioned this issue May 21, 2022

List of crashing tests in M1 GPU (mps) kornia/kornia#1717

Open

Raclez mentioned this issue Jul 23, 2024

使用mps后无法运行 opendatalab/MinerU#194

Closed

betatim mentioned this issue Jul 26, 2024

Small discrepancy with not implemented MPS backend function scikit-learn/scikit-learn#29569

Open

yt-feng mentioned this issue Jul 31, 2024

不支持mps加速 opendatalab/MinerU#260

Closed

General MPS op coverage tracking issue #77764

General MPS op coverage tracking issue #77764

Comments

albanD commented May 18, 2022 • edited by pytorch-bot bot Loading

This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.

philipturner commented May 18, 2022

albanD commented May 18, 2022

pzelasko commented May 18, 2022

Linux-cpp-lisp commented May 18, 2022

richardburleigh commented May 19, 2022 • edited Loading

gautierdag commented May 19, 2022

lhoenig commented May 20, 2022 • edited Loading

Willian-Zhang commented May 20, 2022

albanD commented May 20, 2022

weiji14 commented May 20, 2022 • edited Loading

lhoenig commented May 20, 2022

psobolewskiPhD commented May 20, 2022 • edited Loading

thipokKub commented May 21, 2022

tw-ilson commented May 22, 2022

nicolasbeglinger commented May 22, 2022 • edited Loading

feesta commented May 22, 2022

mooey5775 commented May 23, 2022

RohanM commented May 23, 2022

thipyss commented Jul 19, 2024

pradeepsharma commented Jul 20, 2024

vonlaughing commented Jul 20, 2024

thibaudbrg commented Jul 21, 2024 • edited Loading

gblssroman commented Jul 21, 2024

robtaylor commented Jul 21, 2024

TTonnyy789 commented Jul 21, 2024

nandinimundra commented Jul 22, 2024

tcoroller commented Jul 23, 2024

poldrack commented Jul 24, 2024

qqaatw commented Jul 24, 2024

qqaatw commented Jul 24, 2024

giovanniOfficioso commented Jul 25, 2024

jdcpni commented Jul 25, 2024

betatim commented Jul 26, 2024

aaronllowe commented Jul 28, 2024

capsenz commented Jul 31, 2024

kstan79 commented Aug 1, 2024 • edited Loading

GaussianGuaicai commented Aug 1, 2024

all-creator commented Aug 1, 2024

qqaatw commented Aug 1, 2024

FiReTiTi commented Aug 3, 2024

qqaatw commented Aug 3, 2024 • edited Loading

kevinjohncutler commented Aug 4, 2024

cem-sirin commented Aug 4, 2024

Akossimon commented Aug 4, 2024

JBlitzar commented Aug 4, 2024

albanD commented May 18, 2022 •

edited by pytorch-bot bot

Loading

richardburleigh commented May 19, 2022 •

edited

Loading

lhoenig commented May 20, 2022 •

edited

Loading

weiji14 commented May 20, 2022 •

edited

Loading

psobolewskiPhD commented May 20, 2022 •

edited

Loading

nicolasbeglinger commented May 22, 2022 •

edited

Loading

thibaudbrg commented Jul 21, 2024 •

edited

Loading

kstan79 commented Aug 1, 2024 •

edited

Loading

qqaatw commented Aug 3, 2024 •

edited

Loading