Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General MPS op coverage tracking issue #77764

Open
albanD opened this issue May 18, 2022 · 1,396 comments
Open

General MPS op coverage tracking issue #77764

albanD opened this issue May 18, 2022 · 1,396 comments
Labels
feature A request for a proper, new feature. module: mps Related to Apple Metal Performance Shaders framework tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@albanD
Copy link
Collaborator

albanD commented May 18, 2022

This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.

PyTorch MPS Ops Project : Project to track all the ops for MPS backend. There are a very large number of operators in pytorch and so they are not all yet implemented. We will be prioritizing adding new operators based on user feedback. If possible, please also provide link to the network or use-case where this op is getting used.

As Ops are requested we will add " To Triage" pool. If we have 3+ requests for an operation and given its complexity/need the operation will be moved "To be implemented" pool. If you want to work on adding support for such op, feel free to comment below to get assigned one. Please avoid pickup up an op that is already being worked on tracked in "In progress" pool.

Link to the wiki for details on how to add these ops and example PRs.

MPS operators coverage matrix - The matrix covers most of the supported operators but is not exhaustive. Please look at the In vx.x.x column, if the box is green, it means that the op implementation is included in the latest release; on the other hand, if the box is yellow, it means the op implementation is in the nightly and has not yet included in the latest release. Before you comment below, please take a look at this matrix to make sure the operator you're requesting has not been implemented in nightly. More details can be found on the readme.

cc @kulinseth @malfet @DenisVieriu97 @jhavukainen

@albanD albanD added feature A request for a proper, new feature. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework labels May 18, 2022
@albanD albanD changed the title General MPS op coverage issue May 18, 2022
@philipturner
Copy link

Are there any linear algebra ops not implemented in MPS that you have made custom shaders for? Any shaders I could "borrow" from your project (with full credit) and use in my own? Specifically, it would be helpful to have SVD and reverse-mode Cholesky operators.

@albanD
Copy link
Collaborator Author

albanD commented May 18, 2022

Hey,

There are no custom shaders at the moment as everything we needed for the basic networks we looked at was already provided by MPS (or a set of ops in MPS). Also , required functions that are not in the hot path are simply falling back to CPU for now.

It is mentioned here as this is something that is possible to be done easily within the integration. But not something that is used today.

@pzelasko
Copy link

I was testing a bunch of speech synthesis and vocoder models, and found the following operators missing so far:

  • aten::flip
  • aten::equal
  • aten::upsample_nearest1d.out
@Linux-cpp-lisp
Copy link

One vote for a CPU fallback for torch.bincount.

Is there any reason, given the unified memory architecture, that every op not implemented on Metal cannot just fall back to the CPU implementation without memory copy operations? (Based, of course, on my 10,000ft view of the architecture, which I'm sure is wildly oversimplified.)

@richardburleigh
Copy link

richardburleigh commented May 19, 2022

Tip for everyone:

Run your script with PYTORCH_ENABLE_MPS_FALLBACK=1 which will fallback to the CPU.

I'm using a custom build which merges pull request #77791 so am not sure if this is included in the current build (Edit: It's not. You need to build PyTorch yourself with the pull request or trust an online build with it).

@gautierdag
Copy link

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

@lhoenig
Copy link
Contributor

lhoenig commented May 20, 2022

One missing op I ran into and haven't seen mentioned yet is aten::_unique2.
Edit: This error goes away when passing PYTORCH_ENABLE_MPS_FALLBACK=1 when using the current main branch build. However, instead I get warnings

The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)

then

The dst MTL buffer in copy_to_mps is non-contiguous (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/native/mps/operations/Copy.mm:323.)

and finally the forward pass through my model crashes with

RuntimeError: Placeholder buffer size (7493632) is not large enough to contain the Tensor storage of size 14986944

On cpu it works fine. Could be #77886 I suppose.

@Willian-Zhang
Copy link

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

+1
setting PYTORCH_ENABLE_MPS_FALLBACK=1 still results in:

NotImplementedError: Could not run 'aten::cumsum.out' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::cumsum.out' is only available for these backends: [Dense, Conjugate, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37386 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:31637 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:3288 [kernel]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp:12585 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterFunctionalization_3.cpp:12118 [kernel]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]
@albanD
Copy link
Collaborator Author

albanD commented May 20, 2022

@lhoenig could you open a new separate issue for the cpu fallback failing for you?
The error seems to hint at the fact that you're doing moving across device non-contiguous Tensor. Making sure they are might help as a workaround.
We can continue this discussion in the new issue you will create.

@Willian-Zhang the fallback is ONLY available if you build from source right now. It will be in the nightly build tomorrow (May 21st).

@weiji14
Copy link
Contributor

weiji14 commented May 20, 2022

Would like to add aten::_local_scalar_dense to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.

@lhoenig
Copy link
Contributor

lhoenig commented May 20, 2022

@albanD Yep, making the Tensors contiguous worked. But yet another issue revealed itself. I created #77977 and #78001.

@psobolewskiPhD
Copy link

psobolewskiPhD commented May 20, 2022

I've got a non supported op: aten::grid_sampler_2d

envs/pytorch-env/lib/python3.9/site-packages/torch/nn/functional.py:4172: UserWarning: The operator 'aten::grid_sampler_2d' is not currently supported on the MPS backend and will fall back to run on the CPU. This may performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)
@thipokKub
Copy link

Not supported

  • aten::l1_loss_backward.grad_input
  • aten::kl_div_backward

Code

X, y = torch.rand(16, 10).to("mps"), torch.rand(16, 1).to("mps")
model = nn.Linear(10, 1).to("mps")
criterion = nn.L1Loss() # nn.KLDivLoss()
loss = criterion(model(X), y)
loss.backward()

Output

NotImplementedError: Could not run 'aten::l1_loss_backward.grad_input' with arguments from the 'MPS' backend
@tw-ilson
Copy link

Trying to use affine crop from torchvision, and found the operator aten::linspace.out does not seem to be implemented with the MPS backend

@nicolasbeglinger
Copy link

nicolasbeglinger commented May 22, 2022

Trying to use MPS backend with pytorch geometric, and found the operator aten::index.Tensor is not yet implemented.

@feesta
Copy link

feesta commented May 22, 2022

Found the operator 'aten::grid_sampler_2d' is not current implemented for the MPS device.

@mooey5775
Copy link

Would be great to add aten::adaptive_max_pool2d to the list - seems to be fairly common and for me useful in some point cloud architectures.

@RohanM
Copy link
Contributor

RohanM commented May 23, 2022

I ran into this error with aten::count_nonzero.dim_IntList (via torch.count_nonzero()). I'll take a look at implementing this op with MPS.

@thipyss
Copy link

thipyss commented Jul 19, 2024

The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device.

@pradeepsharma
Copy link

Please prioritize "isin.Tensor_Tensor_out"

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

@vonlaughing
Copy link

Please prioritize aten::upsample_bicubic2d.out, the temporary fix doesn't work for me :(

@thibaudbrg
Copy link

thibaudbrg commented Jul 21, 2024

The operator aten::_fft_r2c is not currently implemented for the MPS device 🙏

@gblssroman
Copy link

+1 for isin.Tensor_Tensor_out

@robtaylor
Copy link

So, really stupid question.. why do these functions need to be reimplemented on each accelerator architecture? Why isn't there a code generator/compiler for this?

@TTonnyy789
Copy link

Please prioritize aten::_convert_indices_from_coo_to_csr.out 🙏

NotImplementedError: The operator aten::_convert_indices_from_coo_to_csr.out is not currently implemented for the MPS device.

@nandinimundra
Copy link

Hi, I want to use the Gemma-2 9B model for a generation task. I am running the code on an M3 Pro chip, but I am getting this error:

Complete error message : -

NotImplementedError: The operator 'aten::isin.Tensor_Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Please let me know how to fix this.
Thanks in Advance.
Best,
Nandini

@tcoroller
Copy link

Hello, I ran into this issue running on mps.

The operator 'aten::_logcumsumexp' is not currently implemented for the MPS device.

We developed a deep survival library and we rely on the function below for our Cox model (see code here).

Would it be possible to add torch.logcumsumexp() to the priority list?

Thanks!

@poldrack
Copy link

here is a vote for aten::slow_conv3d_forward

@qqaatw
Copy link
Collaborator

qqaatw commented Jul 24, 2024

Hello, I ran into this issue running on mps.

The operator 'aten::_logcumsumexp' is not currently implemented for the MPS device.

We developed a deep survival library and we rely on the function below for our Cox model (see code here).

Would it be possible to add torch.logcumsumexp() to the priority list?

Thanks!

done.

@qqaatw
Copy link
Collaborator

qqaatw commented Jul 24, 2024

aten::slow_conv3d_forward

done

@giovanniOfficioso
Copy link

Vote for aten::linalg_lstsq.out

@jdcpni
Copy link

jdcpni commented Jul 25, 2024

Please support 'aten::cumprod.out'... basic function used heavily in lots of statistical / ML apps.!

@betatim
Copy link

betatim commented Jul 26, 2024

We use aten::_linalg_eigh.eigenvalues in scikit-learn's PCA implementation. So giving it a vote.


We noticed that some ops automatically fallback to CPU (e.g. aten::linalg_svd). Is there somewhere to read up on which ops automatically fallback and which don't/why some do and some don't?

@aaronllowe
Copy link

Vote for torchscript:nms, used in quite a few CV libraries such as super-gradients or ultralytics

@capsenz
Copy link

capsenz commented Jul 31, 2024

Voting for aten::upsample_bicubic2d.out to implemented for MPS

@kstan79
Copy link

kstan79 commented Aug 1, 2024

ya please support mps for nms

@GaussianGuaicai
Copy link

Voting for aten::upsample_bicubic2d.out to implemented for MPS

@all-creator
Copy link

+1 for torchvision::nms on mps

@qqaatw
Copy link
Collaborator

qqaatw commented Aug 1, 2024

torchvision::nms is supported since last year.

@FiReTiTi
Copy link

FiReTiTi commented Aug 3, 2024

torchvision::nms is supported since last year.

In which version?

I just ran into this error after updating the libraries: NotImplementedError: The operator 'torchvision::nms' is not currently implemented for the MPS device.

The last upgrade was: Successfully installed torch-2.2.2 torchaudio-2.2.2 torchvision-0.17.2

@qqaatw
Copy link
Collaborator

qqaatw commented Aug 3, 2024

@FiReTiTi Since torchvision 0.16.

If it's still an issue, can you please open an issue in the torchvision repository and provide a minimal reproducer? Thank you.

@kevinjohncutler
Copy link

voting for aten::max_pool3d_with_indices for u-net models

@cem-sirin
Copy link

another vote for atten::upsample_bicubic2d.out from me

@Akossimon
Copy link

a new vote yet again for implementing MPS

.....The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764......

@JBlitzar
Copy link

JBlitzar commented Aug 4, 2024

Voting for aten::_linalg_eigvals

NotImplementedError: The operator 'aten::_linalg_eigvals' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

for torcheval, frenchetinceptiondistance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: mps Related to Apple Metal Performance Shaders framework tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module