Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General MPS op coverage tracking issue #77764

Open
70 of 99 tasks
albanD opened this issue May 18, 2022 · 1,242 comments
Open
70 of 99 tasks

General MPS op coverage tracking issue #77764

albanD opened this issue May 18, 2022 · 1,242 comments
Labels
feature A request for a proper, new feature. module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@albanD
Copy link
Collaborator

albanD commented May 18, 2022

This issue is to have a centralized place to list and track work on adding support to new ops for the MPS backend.

MPS operators coverage matrix - The matrix covers most of the supported operators but is not exhaustive. Before you comment below, please take a look at this matrix to make sure the operator you're requesting has not been implemented in nightly. More details can be found on the readme.

There are a very large number of operators in pytorch and so they are not all implemented yet for the MPS backends as it is still in the prototype phase. We will be prioritizing adding new operators based on user feedback. If possible, please also provide link to the network or use-case where this op is getting used.

If you want to work on adding support for such op, feel free to comment below to get assigned one. Please avoid pickup up an op that is already being worked on or that already has a PR associated with it.

Link to the wiki for details on how to add these ops and example PRs.

Good First Issue:
Below is list of Ops which are good to get started to add operations to MPS backend. Please consider picking them up.

  • nn.Conv3D
  • aten::_weight_norm_interface
  • aten::max_unpool2d
  • aten::cummin.out, aten::cummax.out
  • aten::upsample_linear1d.out
  • aten::lerp.Scalar_out
  • aten::renorm

Not categorized:
These are the ops which are not yet picked up and need MPS implementation.

  • aten::slow_conv3d_forward
  • aten::_ctc_loss
  • aten::avg_pool3d.out
  • aten::linalg_qr.out
  • aten::multilabel_margin_loss_forward
  • aten::unique_dim
  • aten::_sample_dirichlet
  • aten::_fft_r2c
  • aten::upsample_bicubic2d.out
  • aten::linalg_inv_out_helper
  • aten::bucketize
  • aten::_embedding_bag
  • aten::_standard_gamma
  • aten::_upsample_bicubic2d_aa.out
  • aten::'aten::_symeig_helper
  • aten::linalg_matrix_exp
  • aten::_nested_tensor_from_mask
  • aten::randperm.generator_out
  • aten::_fused_sdp_choice
  • aten::linalg_cholesky_ex
  • aten::scatter_reduce.two_out
  • aten::kthvalue.values
  • aten::_linalg_solve_ex.result
  • aten::grid_sampler_2d_backward'
  • max_pool3d (unfinished attempt Add mps support for maxpool3d #102148)

WIP:

  • aten::kl_div_backward (Is not needed )

Implemented Ops:
Ops that have MPS backend implementations.

See MPS operators coverage matrix and the readme for more details.

deprecated list

Ops not supported by MPS:
Ops that will require either to use the CPU fallback system or a custom Metal kernel.

  • aten::lgamma.out
  • aten::linalg_householder_product
@albanD albanD added feature A request for a proper, new feature. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: mps Related to Apple Metal Performance Shaders framework labels May 18, 2022
@albanD albanD changed the title General MPS op coverage issue General MPS op coverage tracking issue May 18, 2022
@philipturner
Copy link

Are there any linear algebra ops not implemented in MPS that you have made custom shaders for? Any shaders I could "borrow" from your project (with full credit) and use in my own? Specifically, it would be helpful to have SVD and reverse-mode Cholesky operators.

@albanD
Copy link
Collaborator Author

albanD commented May 18, 2022

Hey,

There are no custom shaders at the moment as everything we needed for the basic networks we looked at was already provided by MPS (or a set of ops in MPS). Also , required functions that are not in the hot path are simply falling back to CPU for now.

It is mentioned here as this is something that is possible to be done easily within the integration. But not something that is used today.

@pzelasko
Copy link

I was testing a bunch of speech synthesis and vocoder models, and found the following operators missing so far:

  • aten::flip
  • aten::equal
  • aten::upsample_nearest1d.out

@Linux-cpp-lisp
Copy link

One vote for a CPU fallback for torch.bincount.

Is there any reason, given the unified memory architecture, that every op not implemented on Metal cannot just fall back to the CPU implementation without memory copy operations? (Based, of course, on my 10,000ft view of the architecture, which I'm sure is wildly oversimplified.)

@richardburleigh
Copy link

richardburleigh commented May 19, 2022

Tip for everyone:

Run your script with PYTORCH_ENABLE_MPS_FALLBACK=1 which will fallback to the CPU.

I'm using a custom build which merges pull request #77791 so am not sure if this is included in the current build (Edit: It's not. You need to build PyTorch yourself with the pull request or trust an online build with it).

@gautierdag
Copy link

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

@lhoenig
Copy link
Contributor

lhoenig commented May 20, 2022

One missing op I ran into and haven't seen mentioned yet is aten::_unique2.
Edit: This error goes away when passing PYTORCH_ENABLE_MPS_FALLBACK=1 when using the current main branch build. However, instead I get warnings

The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)

then

The dst MTL buffer in copy_to_mps is non-contiguous (Triggered internally at  /Users/lukas/pytorch/aten/src/ATen/native/mps/operations/Copy.mm:323.)

and finally the forward pass through my model crashes with

RuntimeError: Placeholder buffer size (7493632) is not large enough to contain the Tensor storage of size 14986944

On cpu it works fine. Could be #77886 I suppose.

@Willian-Zhang
Copy link

Testing with some huggingface transformers code: + 1 vote for aten::cumsum.out
Tried with the fallback env var but doesn't seem to work for me.

+1
setting PYTORCH_ENABLE_MPS_FALLBACK=1 still results in:

NotImplementedError: Could not run 'aten::cumsum.out' with arguments from the 'MPS' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::cumsum.out' is only available for these backends: [Dense, Conjugate, UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseXPU, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

CPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterCPU.cpp:37386 [kernel]
Meta: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterMeta.cpp:31637 [kernel]
BackendSelect: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:11 [kernel]
Conjugate: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:3288 [kernel]
AutogradOther: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradCUDA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXLA: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradMPS: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradIPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradXPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradHPU: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradLazy: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse1: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse2: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
AutogradPrivateUse3: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:13238 [autograd kernel]
Tracer: registered at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/autograd/generated/TraceType_0.cpp:12585 [kernel]
AutocastCPU: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/RegisterFunctionalization_3.cpp:12118 [kernel]
PythonTLSSnapshot: registered at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]

@albanD
Copy link
Collaborator Author

albanD commented May 20, 2022

@lhoenig could you open a new separate issue for the cpu fallback failing for you?
The error seems to hint at the fact that you're doing moving across device non-contiguous Tensor. Making sure they are might help as a workaround.
We can continue this discussion in the new issue you will create.

@Willian-Zhang the fallback is ONLY available if you build from source right now. It will be in the nightly build tomorrow (May 21st).

@weiji14
Copy link
Contributor

weiji14 commented May 20, 2022

Would like to add aten::_local_scalar_dense to the list. Also, is it possible to link to some examples in the top post on how we can implement these into Pytorch? I'd love to give it a shot if it's not too hard.

@lhoenig
Copy link
Contributor

lhoenig commented May 20, 2022

@albanD Yep, making the Tensors contiguous worked. But yet another issue revealed itself. I created #77977 and #78001.

@psobolewskiPhD
Copy link

psobolewskiPhD commented May 20, 2022

I've got a non supported op: aten::grid_sampler_2d

envs/pytorch-env/lib/python3.9/site-packages/torch/nn/functional.py:4172: UserWarning: The operator 'aten::grid_sampler_2d' is not currently supported on the MPS backend and will fall back to run on the CPU. This may performance implications. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)

@thipokKub
Copy link

Not supported

  • aten::l1_loss_backward.grad_input
  • aten::kl_div_backward

Code

X, y = torch.rand(16, 10).to("mps"), torch.rand(16, 1).to("mps")
model = nn.Linear(10, 1).to("mps")
criterion = nn.L1Loss() # nn.KLDivLoss()
loss = criterion(model(X), y)
loss.backward()

Output

NotImplementedError: Could not run 'aten::l1_loss_backward.grad_input' with arguments from the 'MPS' backend

@tw-ilson
Copy link

Trying to use affine crop from torchvision, and found the operator aten::linspace.out does not seem to be implemented with the MPS backend

@nicolasbeglinger
Copy link

nicolasbeglinger commented May 22, 2022

Trying to use MPS backend with pytorch geometric, and found the operator aten::index.Tensor is not yet implemented.

@feesta
Copy link

feesta commented May 22, 2022

Found the operator 'aten::grid_sampler_2d' is not current implemented for the MPS device.

@mooey5775
Copy link

Would be great to add aten::adaptive_max_pool2d to the list - seems to be fairly common and for me useful in some point cloud architectures.

@RohanM
Copy link
Contributor

RohanM commented May 23, 2022

I ran into this error with aten::count_nonzero.dim_IntList (via torch.count_nonzero()). I'll take a look at implementing this op with MPS.

@succichang
Copy link

Voting for aten::upsample_bicubic2d.out

@danadascalescu00
Copy link

🆙 aten::isin.Tensor_Tensor_out Thank you for your hard work!

@wuhongsheng
Copy link

Voting for aten::angle

@FrederikWR
Copy link

Voting for aten::isin.Tensor_Tensor_out - thanks!

@eifuentes
Copy link

eifuentes commented May 14, 2024

Voting for aten::_embedding_bag which is heavily used in recommendation systems e.g. torchrec's own implementation.

@SimonvBaal
Copy link

SimonvBaal commented May 15, 2024

Also voting for aten::isin.Tensor_Tensor_out. Thank you!

@vobecant
Copy link

+1 for aten::upsample_bicubic2d.out

@25is
Copy link

25is commented May 16, 2024

NotImplementedError: The operator 'aten::index_copy.out' is not currently implemented for the MPS device.

Version: 2.4.0.dev20240515

@louisfabrice13
Copy link

Voting for all Conv3d related and Upsample 3d related operations

@CCranney
Copy link

Voting for:
NotImplementedError: The operator 'aten::_nested_tensor_from_mask_left_aligned' is not currently implemented for the MPS device

Comes up with various masking attempts in self attention modules.

@giamic
Copy link

giamic commented May 16, 2024

+1 for aten::_fft_r2c

@masc-it
Copy link

masc-it commented May 16, 2024

voting for: aten::grid_sampler_2d_backward
(RT_DETR model)

@danieldanciu
Copy link

Voting for aten::isin.Tensor_Tensor_out

1 similar comment
@garethcthomasdev
Copy link

Voting for aten::isin.Tensor_Tensor_out

@YannickDamoiseaux
Copy link

Also voting for aten::isin.Tensor_Tensor_out

@johnnynunez
Copy link

nms:
WARNING ⚠️ NMS time limit 2.100s exceeded

@X901
Copy link

X901 commented May 19, 2024

I tried it multiple time in M1 Ultra

I always get
WARNING ⚠️ NMS time limit [ ]s exceeded
it still not stable, you can't depend on it

I hope it become better in the future

@BudgieBird
Copy link

BudgieBird commented May 21, 2024

Voting for 'aten::isin.Tensor_Tensor_out' as well, appreciate it!

@johnnynunez
Copy link

I find it unbelievable that with the money apple has, they don't invest in having pytorch natively with all its operations.

@pranavchaturved
Copy link

pranavchaturved commented May 22, 2024 via email

@johnnynunez
Copy link

They would rather invest in something of their own, which is what they are doing. ml-explore/mlx: MLX: An array framework for Apple silicon (github.com) https://github.com/ml-explore/mlx

On Wed, May 22, 2024 at 2:05 PM Johnny @.> wrote: I find it unbelievable that with the money apple has, they don't invest in having pytorch natively with all its operations. — Reply to this email directly, view it on GitHub <#77764 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4V52AFMFA5PG34PRSNAH33ZDRKDZAVCNFSM5WJJ2R42U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJSGQYTSMRUGQYQ . You are receiving this because you commented.Message ID: @.>

Until they have easy conversion with pytorch/jax or similar difficult. Because people usually test locally and then send to the servers to train.....

@JustinGuese
Copy link

Voting for 'aten::isin.Tensor_Tensor_out' as well, appreciate it!

@s0l4r
Copy link

s0l4r commented May 22, 2024

Voting for: aten::upsample_bicubic2d.out. Thanks!

@Club-d
Copy link

Club-d commented May 23, 2024

Voting for “The operator 'aten::scatter_reduce.two_out'. Thanks

NotImplementedError: The operator '**

aten::scatter_reduce.two_out

' is not currently implemented for the MPS device

@vision0array
Copy link

voting for 'aten::upsample_bicubic2d.out'.

is not currently implemented for the MPS device.

@Raman-Kumar
Copy link
Contributor

@albanD hey, I’m interested in working on aten::max_unpool2d

@janboeye
Copy link

voting for aten::amp_foreach_non_finite_check_and_unscale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet